by Nancy Couture

How to test for a best in class agile data warehouse environment

Opinion
Dec 04, 2015
Agile DevelopmentData Warehousing

Having a solid testing strategy and tool set is a foundational part of enabling agile data warehouse development. This article describes an approach that ensures solid testing that can be done efficiently and effectively in an agile development environment.

This first in this series of articles describe foundational steps that enable agile data warehouse development – something that has been a challenge in enterprise data management for years.  My prior articles published thus far describe how to develop a business conceptual model as a starting point, building a “grass roots” (at a minimum) data governance capability, and developing a high level data flow architecture.

The next focus for setting yourself up for a best in class agile data warehouse environment is to develop a solid testing approach and tools before actual development begins.

Why test? 

The data warehouse will be a strategic enterprise resource and heavily relied upon.

One of the weaknesses we have in the data community is testing, which helps to explain the data quality challenges we continue to suffer in production data warehouses.

Gartner predicts that by 2017, 33 percent of Fortune 100 companies will experience information crises due to their inability to adequately value, govern and trust their enterprise information.  A robust testing approach can help avert (or at least minimize) such a crisis.

Developing a well-planned and executed end to end data warehouse testing process can help you avoid serious data related risks.  When moving to agile development, we have the opportunity to do significantly more testing than what typically occurs in traditional projects.  

We can do:

  • parallel test development
  • developer unit testing
  • code reviews
  • QA testing
  • regression test bed development and maintenance

… and we do this with every user story, or every logical grouping of user stories.

As we move into agile data warehouse development, we create a certain rhythm of activities:

  • Business priorities will drive our sprint backlog
  • Developers work on sets of user stories from this prioritized backlog in 2 week sprints (or whatever time cycle is appropriate)
  • Independent QA test development occurs during the sprint in parallel
  • Cross – testing the results of development and QA adds a higher level of confidence in the results
  • Deployment to the QA/UAT environment at the end of the development cycle is next
  • Integrated QA testing and UAT can occur 1 sprint behind
  • Production deployment can bundle sprints and takes place in releases as long as all testing passes

The QA tests are created in development, then move to the QA/UAT environment, then move to the prod environment.  As a result, we have a robust automated regression testing capability within each environment.  Test cases can also run nightly in production for continuous quality monitoring.  This will then feed into our data quality capabilities, since the testing can be run on a frequent basis to ensure consistent quality.

To ensure that a strong test practice is followed, we would put a standard set of tasks into most user stories:

The automated data validation tool we implemented can use the same tests for unit testing, integration testing, regression testing and ongoing data quality monitoring in production.

The testing developed during development in parallel facilitates agile development, since development and testing can be completed in a single sprint.  By running both ETL and test cases in parallel, mismatches in results can point to an ETL code defect, a test case code defect, or a defect in our source to target specs.

Implementing a robust testing capability was a lesson learned for my team as we started data warehouse development.  Initially, our QA team manually tested each user story after the two week sprint, and after development / unit testing was complete.  There were several challenges with this approach.

  • The manual testing was essentially ‘throw away’ and didn’t provide any future benefit.
  • If there were issues found during testing, the developers were already onto their next sprint cycle and had a difficult time remembering the work they’d done in a prior sprint, as well as finding the time to address the issues that were discovered.

Once we started parallel development and testing in a single sprint, we were able to focus on the same set of user stories, and also develop re-usable testing capabilities.

However, it took us a long time to recover from this oversight.  We had a long list of user stories that did not have re-usable tests developed.  We took the time to develop these over the course of several months.  We also made the commitment to parallel develop all future user stories so that we didn’t accumulate technical debt in this area in the future.  Once we made that commitment, we found that we were able to move into a very efficient, and agile, development cycle.

Additional steps in building this foundational approach to agile data warehouse development include: 

  • implementing a robust data quality program
  • giving the development team the ability to self manage their agile development approach, incorporating continuous improvement

I will cover these remaining steps in the next few upcoming articles.