According to TDWI, the cost of bad data is more than $600 billion annually in the U.S. There are many negative consequences of low data quality, including:
Low customer satisfaction
Loss of customers
Misguided business decisions
Financial inaccuracies and mistakes
Legal and monetary penalties
Negative company image
All too often, companies invest in a data warehouse, but a proactive data quality solution is an afterthought. Developing a well-planned and scalable data quality capability as part of your foundational work can go a long way in improving the quality of your data. If done well, it will also improve the business stakeholder confidence in your data.
First of all, let’s define data quality. Way back in 1996, when I was first developing data quality processes, it was simply defined as “fitness for use”, which is still an appropriate high level definition. In order for data to be “fit for use,” an organization will need to define what aspects are most important to them. Below is a quote from a prior co-worker, who has focused on all aspects of data quality throughout her career.
“Data and information quality thinkers have adopted the word dimension to identify those aspects of data that can be measured and through which its quality can be quantified. While different experts have proposed different sets of data quality dimensions … almost all include some version of accuracy and validity, completeness, consistency, and currency or timeliness among them.”
— Sebastian-Coleman, Laura . Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework
Rather than trying to focus on every dimension, start by focusing on the basics of completeness and timeliness, then move on to validity and consistency. These four dimensions can truly enhance the quality of enterprise data as well as stakeholders’ confidence in the data they consume.
Completeness is first and foremost. Stakeholders need to know that what’s in the source is accounted for in the target. You can ensure completeness in a variety of ways. For example, a record-balancing capability that records a count at the end of one flow and at the beginning of another to ensure all records are accounted for. The ultimate goal is to validate that every record and its corresponding information from a source is handled appropriately during processing. This source-to-target validation should be monitored and reported to the organization’s data consumers.
Timeliness should be a component of service-level agreements (SLAs) and identify such criteria as acceptable levels of data latency, frequency of data updates, and data availability. Timeliness can then be measured against these defined SLAs and shared as part of the data quality metrics.
Validity is a key data quality measure that indicates the “correctness” of the actual data content; for example, confirming that all the characters in a telephone number field are digits, not alphabetic characters. This is the concept that most data consumers think about when they envision data quality. Validity can be assessed through data profiling, data cleansing, and inline data quality checks that may perform comparisons of incoming values to expected values or to values defined within a stated range of acceptability. Alerts can be set, depending on the validity checks used. The results of the validity checks should be measured and shared as part of the data quality metrics.
Consistency is crucial to continued consumer confidence. Once data quality metrics are being monitored and reported to the business stakeholders for completeness, timeliness, and validity, then consistency can be measured by assessing changes in these patterns over time. These results can be added to the data quality metrics reporting that is shared with business stakeholders.
Complete transparency of data quality metrics and reporting to your organization’s data consumers will lead to greater confidence in the quality of the underlying data.
Stakeholder confidence will continue to increase if you are able to proactively identify issues through active data quality monitoring before the data consumers find them. This is one of the greatest achievements of a robust data quality program.
The next article will cover the next step in building the foundational approach to agile data warehouse development: giving the development team the ability to self manage their agile development approach, incorporating continuous improvement.
Nancy Couture has more than 30 years of experience leading enterprise data management at Fortune 500 companies and midsize organizations. Her focus has been on enterprisewide data management architecture, data governance, data quality, data warehousing and business intelligence capabilities.
Nancy recently moved into consulting as delivery enablement lead at Datasource Consulting, a Denver-based firm focused on delivering on all aspects of enterprise information management.
Previously, Nancy was vice president of business intelligence at SquareTwo Financial in Denver. She and her team successfully developed and utilized agile methodologies in building out enterprisewide solutions, including an enterprise data warehouse, a robust analytics and reporting environment, and integrated analytics solutions.
Before her time at SquareTwo, Nancy was vice president of data management solutions at UnitedHealth Group in Connecticut, where she developed and managed three enterprise-level data warehouses for healthcare analytics over the course of 30-plus years. In that role, Nancy was recognized for her leadership and ability to execute innovative approaches to data management.
Nancy has presented at many conferences on data management topics over the years, owns a patent in data mapping technologies, and has published several articles for the TDWI Business Intelligence Journal. In 2007 and again in 2015, her respective teams won the TDWI Best Practices Award in Enterprise Data Warehousing.
The opinions expressed in this blog are those of Nancy Couture and do not necessarily represent those of IDG Communications Inc. or its parent, subsidiary or affiliated companies.