How to test for a best in class agile data warehouse environment
Having a solid testing strategy and tool set is a foundational part of enabling agile data warehouse development. This article describes an approach that ensures solid testing that can be done efficiently and effectively in an agile development environment.
This first in this series of articles describe foundational steps that enable agile data warehouse development – something that has been a challenge in enterprise data management for years. My prior articles published thus far describe how to develop a business conceptual model as a starting point, building a “grass roots” (at a minimum) data governance capability, and developing a high level data flow architecture.
The next focus for setting yourself up for a best in class agile data warehouse environment is to develop a solid testing approach and tools before actual development begins.
The data warehouse will be a strategic enterprise resource and heavily relied upon.
One of the weaknesses we have in the data community is testing, which helps to explain the data quality challenges we continue to suffer in production data warehouses.
Gartner predicts that by 2017, 33 percent of Fortune 100 companies will experience information crises due to their inability to adequately value, govern and trust their enterprise information. A robust testing approach can help avert (or at least minimize) such a crisis.
Developing a well-planned and executed end to end data warehouse testing process can help you avoid serious data related risks. When moving to agile development, we have the opportunity to do significantly more testing than what typically occurs in traditional projects.
We can do:
parallel test development
developer unit testing
regression test bed development and maintenance
… and we do this with every user story, or every logical grouping of user stories.
As we move into agile data warehouse development, we create a certain rhythm of activities:
Business priorities will drive our sprint backlog
Developers work on sets of user stories from this prioritized backlog in 2 week sprints (or whatever time cycle is appropriate)
Independent QA test development occurs during the sprint in parallel
Cross – testing the results of development and QA adds a higher level of confidence in the results
Deployment to the QA/UAT environment at the end of the development cycle is next
Integrated QA testing and UAT can occur 1 sprint behind
Production deployment can bundle sprints and takes place in releases as long as all testing passes
The QA tests are created in development, then move to the QA/UAT environment, then move to the prod environment. As a result, we have a robust automated regression testing capability within each environment. Test cases can also run nightly in production for continuous quality monitoring. This will then feed into our data quality capabilities, since the testing can be run on a frequent basis to ensure consistent quality.
To ensure that a strong test practice is followed, we would put a standard set of tasks into most user stories:
The automated data validation tool we implemented can use the same tests for unit testing, integration testing, regression testing and ongoing data quality monitoring in production.
The testing developed during development in parallel facilitates agile development, since development and testing can be completed in a single sprint. By running both ETL and test cases in parallel, mismatches in results can point to an ETL code defect, a test case code defect, or a defect in our source to target specs.
Implementing a robust testing capability was a lesson learned for my team as we started data warehouse development. Initially, our QA team manually tested each user story after the two week sprint, and after development / unit testing was complete. There were several challenges with this approach.
The manual testing was essentially ‘throw away’ and didn’t provide any future benefit.
If there were issues found during testing, the developers were already onto their next sprint cycle and had a difficult time remembering the work they’d done in a prior sprint, as well as finding the time to address the issues that were discovered.
Once we started parallel development and testing in a single sprint, we were able to focus on the same set of user stories, and also develop re-usable testing capabilities.
However, it took us a long time to recover from this oversight. We had a long list of user stories that did not have re-usable tests developed. We took the time to develop these over the course of several months. We also made the commitment to parallel develop all future user stories so that we didn’t accumulate technical debt in this area in the future. Once we made that commitment, we found that we were able to move into a very efficient, and agile, development cycle.
Additional steps in building this foundational approach to agile data warehouse development include:
implementing a robust data quality program
giving the development team the ability to self manage their agile development approach, incorporating continuous improvement
I will cover these remaining steps in the next few upcoming articles.
Nancy Couture has more than 30 years of experience leading enterprise data management at Fortune 500 companies and midsize organizations. Her focus has been on enterprisewide data management architecture, data governance, data quality, data warehousing and business intelligence capabilities.
Nancy recently moved into consulting as delivery enablement lead at Datasource Consulting, a Denver-based firm focused on delivering on all aspects of enterprise information management.
Previously, Nancy was vice president of business intelligence at SquareTwo Financial in Denver. She and her team successfully developed and utilized agile methodologies in building out enterprisewide solutions, including an enterprise data warehouse, a robust analytics and reporting environment, and integrated analytics solutions.
Before her time at SquareTwo, Nancy was vice president of data management solutions at UnitedHealth Group in Connecticut, where she developed and managed three enterprise-level data warehouses for healthcare analytics over the course of 30-plus years. In that role, Nancy was recognized for her leadership and ability to execute innovative approaches to data management.
Nancy has presented at many conferences on data management topics over the years, owns a patent in data mapping technologies, and has published several articles for the TDWI Business Intelligence Journal. In 2007 and again in 2015, her respective teams won the TDWI Best Practices Award in Enterprise Data Warehousing.
The opinions expressed in this blog are those of Nancy Couture and do not necessarily represent those of IDG Communications Inc. or its parent, subsidiary or affiliated companies.