Data quality is becoming a key concern for companies who rely on data on a daily basis. Without a purposeful data quality program, information becomes inconsistent and unreliable. This first series of articles describe foundational steps that enable agile data warehouse development. My prior articles published thus far describe how to: develop a Business Conceptual model as a starting pointbuild a “grass roots” data governance capabilitydevelop a high level data flow architectureensure solid testing & tools The next focus for setting yourself up for a best in class agile data warehouse environment is to develop a robust data quality solution. According to TDWI, the cost of bad data is more than $600 billion annually in the U.S. There are many negative consequences of low data quality, including: SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe Low customer satisfactionLoss of customersMisguided business decisionsMissed opportunitiesFinancial inaccuracies and mistakesLegal and monetary penaltiesNegative company image All too often, companies invest in a data warehouse, but a proactive data quality solution is an afterthought. Developing a well-planned and scalable data quality capability as part of your foundational work can go a long way in improving the quality of your data. If done well, it will also improve the business stakeholder confidence in your data. First of all, let’s define data quality. Way back in 1996, when I was first developing data quality processes, it was simply defined as “fitness for use”, which is still an appropriate high level definition. In order for data to be “fit for use,” an organization will need to define what aspects are most important to them. Below is a quote from a prior co-worker, who has focused on all aspects of data quality throughout her career. “Data and information quality thinkers have adopted the word dimension to identify those aspects of data that can be measured and through which its quality can be quantified. While different experts have proposed different sets of data quality dimensions … almost all include some version of accuracy and validity, completeness, consistency, and currency or timeliness among them.” — Sebastian-Coleman, Laura [2013]. Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework Rather than trying to focus on every dimension, start by focusing on the basics of completeness and timeliness, then move on to validity and consistency. These four dimensions can truly enhance the quality of enterprise data as well as stakeholders’ confidence in the data they consume. Completeness is first and foremost. Stakeholders need to know that what’s in the source is accounted for in the target. You can ensure completeness in a variety of ways. For example, a record-balancing capability that records a count at the end of one flow and at the beginning of another to ensure all records are accounted for. The ultimate goal is to validate that every record and its corresponding information from a source is handled appropriately during processing. This source-to-target validation should be monitored and reported to the organization’s data consumers. Timeliness should be a component of service-level agreements (SLAs) and identify such criteria as acceptable levels of data latency, frequency of data updates, and data availability. Timeliness can then be measured against these defined SLAs and shared as part of the data quality metrics. Validity is a key data quality measure that indicates the “correctness” of the actual data content; for example, confirming that all the characters in a telephone number field are digits, not alphabetic characters. This is the concept that most data consumers think about when they envision data quality. Validity can be assessed through data profiling, data cleansing, and inline data quality checks that may perform comparisons of incoming values to expected values or to values defined within a stated range of acceptability. Alerts can be set, depending on the validity checks used. The results of the validity checks should be measured and shared as part of the data quality metrics. Consistency is crucial to continued consumer confidence. Once data quality metrics are being monitored and reported to the business stakeholders for completeness, timeliness, and validity, then consistency can be measured by assessing changes in these patterns over time. These results can be added to the data quality metrics reporting that is shared with business stakeholders. Complete transparency of data quality metrics and reporting to your organization’s data consumers will lead to greater confidence in the quality of the underlying data. Stakeholder confidence will continue to increase if you are able to proactively identify issues through active data quality monitoring before the data consumers find them. This is one of the greatest achievements of a robust data quality program. The next article will cover the next step in building the foundational approach to agile data warehouse development: giving the development team the ability to self manage their agile development approach, incorporating continuous improvement. Related content opinion Data governance – Proving value How exactly does data governance make a difference? By Nancy Couture Apr 29, 2019 6 mins Technology Industry IT Governance Data Management opinion Implementing data governance – 3 key lessons learned Even though data governance can be somewhat fluid and iterative in its development, following best practices and designing a thoughtfully positioned roadmap can help ensure success. By Nancy Couture Dec 18, 2018 5 mins IT Governance IT Strategy Data Management opinion Data governance: the start of something wonderful Data governance enhances business engagement, shared understanding, focus and alignment, bringing an ever-increasingly disconnected data environment together, and providing data value optimization across many EDM initiatives. By Nancy Couture Aug 23, 2018 6 mins Technology Industry IT Governance Data Management opinion Assess before you leap Organizations are ensuring their enterprise data management initiatives with proper data governance. By Nancy Couture Apr 26, 2018 5 mins Business Intelligence Analytics Data Management Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe