Dirty Data Can Jeopardize Your CRM Effort

Reader ROI

* Understand why poor data quality is no longer a little problem

* Find out how costly dirty data can be

* Learn how to cleanse data

For years, the Montana Department of Corrections was a prisoner of data quality problems. Aging IT systems perpetrated countless data entry offenses in reports that the prison system was required to submit to state and federal authorities. And while the department’s IS group put in hours of manual labor to try to maintain some level of reporting integrity, overall confidence in the quality of data was nonexistent and morale in the IS group was low. The situation came to a head four years ago when the department nearly lost a coveted $1 million federal grant. The culprit: information systems that, lacking business rules and a data dictionary, failed to accurately forecast how many of a particular type of offender would be incarcerated. "We had an egregious data quality problem. Not to the point where we were losing offenders?but we weren’t able to accurately portray how many we thought we’d have over the next two to five years," says Dan Chelini, bureau chief for information services at the Helena, Mont.-based department.

With the go-ahead from the state prison’s board of directors, Chelini’s department mounted an aggressive campaign, from late 1997 to mid-1999, to turn around data quality as part of an overhaul of the prison system. The first step was to bring in a team from Information Impact International, a consultancy specializing in data quality, to evaluate organizational processes, acquaint the department with the concept of data stewardship and set up a methodology for data entry. Although some employees were leery at first of the new demands, they bought into the new standards once trained in basic data modeling and data cleansing techniques. A data validity officer was also appointed to rally support for the program and enforce the new rules.

The program officially launched August 2000, and the department claims to see some real results. Instead of a handful of programmers holding all of the responsibility for prisoners’ information, 30 data stewards from all walks of prison life?from probation officers and attorneys to the guy who showers prisoners when they first enter a facility?now function as data quality gatekeepers. They are accountable for accurately entering information on prisoners, such as names, last known addresses and identifying scars and disfigurements. The Montana Department of Corrections’ data quality problem has been detained. "For the first time in years, we’re meeting deliverables" such as reports to federal overseers, says Data Validity Officer Lou Walters. "People are involved and excited about pushing data quality."

Although companies deal with customers, not prisoners, the increasing need for accurate data is driving many organizations?in finance, health care, retail and other segments?to launch formal initiatives to bolster the quality of customer information in core business systems. Until recently most organizations haven’t felt a lot of urgency or enthusiasm about cleaning up dirty data; inaccurate and multiple listings of customer information were seen as trivial problems and a tolerable price of doing business. But the current trend in many industries toward data warehousing and data mining has increased the value of good data and the costs of cleaning up databases. That task is anything but trivial, and the costs, which include the direct costs of hiring people and consultants and the indirect costs of missed sales opportunities, are significant.

"Our studies in cost analysis show that between 15 percent to greater than 20 percent of a companies’ operating revenue is spent doing things to get around or fix data quality issues," says Larry English, principal of Information Impact in Brentwood, Tenn. Some organizations, like the Montana Department of Corrections, are creating full-time positions around data quality and instituting homegrown methodologies to ensure that information stays consistent and is usable across different types of applications. Other companies are purchasing data cleansing services and customer identification and standardization software from companies such as Vality Technology in Boston and Innovative Systems in Pittsburgh to clean up their act.

Stage One Denial

For many companies, dirty data remains an unknown problem. "Old systems have limped along for years basically hiding data quality problems, either through departments putting out multiple versions of reports or leaving the reconciliation work to find the real answers to people who do this stuff by hand," says Ken Orr, founder of the Ken Orr Institute, an IT consultancy based in Topeka, Kan., and a consultant with the Cutter Technology Council, an IT think tank. Vality’s term for this syndrome is "data denial," says Dave Stanvick, vice president of marketing communications. "Data quality has remained a closeted issue in IT because there’s little visibility at the management level that the problem is occurring. Generally, data would have gone through many days of manual rework before it’s presented in a report to senior management."

This kind of laborious scrap and rework, as it’s called, fuels one of the most dangerous misperceptions surrounding data quality: that dirty data is all about simple inaccuracies like misspelled names, incomplete addresses and missing data fields. Throw some manpower at a database cleanup job, the theory goes, and the problem will go away. Not so, experts caution, who say that data scrubbing is a first step. The more critical move, they say, is to create standards for how data on customers or products is represented so that it maintains its integrity, whether used for billing purposes or to drive a direct-marketing campaign. This is also the only real way companies can get a customer’s composite picture across all parts of an organization?a practice necessary for delivering the personalized service that many customers demand.

"How information is used is changing," explains Mary Knox, senior research analyst at Gartner Financial Services in Durham, N.C. "The focus used to be on data processing?where the value of data consisted in the context of a specific application. But data that’s perfectly fine for the original application takes on new meaning and could very well cause big problems if you try to use it in a different way."

Consider a typical scenario: Let’s say a Jon B. Smith at 123 Main St. in Lowell, Mass., exists in a bank’s mortgage origination system and a John Smith at 123 Main Drive in Lowell, Mass., comes up in the bank’s system for car loans. Without knowing for certain if the two Smiths are the same individual, companies can still process bills and get paid?albeit with some potential for duplicate work and a confused customer or two. That level of uncertainty doesn’t fly, however, if a company attempts to use that same data to pinpoint cross-selling opportunities based on a customer’s profile. And the situation worsens when the company tries to identify all possible customers in a particular household. A sales contact from this database could alienate customers with improperly targeted pitches or end up in the dead-letter office.

It’s this inability to effectively uncover patterns in customer data?despite the millions of dollars now being poured into data warehouse projects?that’s starting to raise data quality red flags. But even as companies acknowledge the problem, most have yet to embark on any formal campaign to measure the hidden costs of poor data quality. "Most companies don’t have the time, energy and drive to do the kind of formal analysis it takes to evaluate the impact of dirty data on their businesses, except when a huge explosion takes place," says Stuart Madnick, the John Norris Maguire professor of information technology at MIT’s Sloan School of Management in Cambridge, Mass. He also coheads MIT’s Total Data Quality Management Program, a research program devoted to the theory and practice of improving data quality. "The real cost of data quality has to do with how it impacts business, and that analysis is not trivial."

What’s easy to determine, Madnick says, are the direct costs?for example, what it costs to employ personnel to manually check and correct database records and reports, or the expense of materials and postage for redundant mailings or product returns. But there are less-obvious, hard-to-measure expenses as well. These might be costs associated with warehouse space used to house excess inventory that was ordered because of faulty data, or equipment and facilities allocated to personnel who are strictly employed for the purposes of data quality workarounds. Finally, there are sales prospects neglected because data is unreliable and customers lost because of too few or too frequent marketing contacts.

Stage Two Acceptance

How can your company establish good data quality? At a few companies, the scrap and rework mind-set is slowly being edged out by a new culture predicated on making data quality improvements a continuous process, and giving employees at many levels responsibility for data quality. This means moving data quality concerns out of the back office, and making every employee who handles customer data accountable for ensuring that it adheres to the organization’s established data guidelines. Buy-in from top management is essential to making this kind of radical organizational shift. "Management has to feel the pain of the status quo?they must understand the costs that have become an accepted way of doing business...because they’ve been so far removed from them in the past," says Information Impact’s English.

Health care is one industry where the executive ranks have hardly been able to ignore data quality issues. In that segment, high-end data cleansing software packages are fairly common because the stakes are so high. At Saint Alphonsus Regional Medical Center in Boise, Idaho, for example, proper patient identification is the CIO’s number-one priority. Without high-quality data to make these identifications, health-care organizations like Saint Alphonsus put themselves at risk for everything from billing snafus to misdiagnoses that can endanger patients’ lives (and engender huge lawsuits).

"Everything flows from making the proper patient ID," explains Leslie Kelly Hall, vice president and CIO of Saint Alphonsus. "We have more than 500,000 patients in our master patient index, which represents a good deal of Idaho’s population. We can’t begin to cut automation costs without first getting the correct identification of the patient, the provider or the insurer."

Saint Alphonsus employs Healthcare.com’s EMerge master person index, which embeds Vality’s Integrity data cleansing and standardization software, to ensure that it identifies patients with the highest degree of accuracy. Once confirmed in EMerge, the patient ID is broadcast to 46 connected systems, including those running various labs, pharmacies, electronic medical records and billing. "To the degree to which we can automate the process, we eliminate human error, which leads to dramatic savings and improvements in care," Hall says.

The Prudential Insurance Co. of America also has managed to weave data quality best practices into its day-to-day operations. But that wasn’t always the case. Problems came to light around 1996 as the insurance giant embarked on a data warehouse project to get a companywide view of customers across eight lines of business (LOB), from traditional casualty and property insurance to financial services, in pursuit of data mining opportunities. "In the process of pulling together all the LOBs into one enterprise data warehouse, we realized that we had a lot of differences across data," says Pat Komar, Prudential’s vice president of information services in Newark, N.J. Each LOB had developed its own set of codes for describing elements like customer name and policy number. That wasn’t a problem as long as the LOB data was siloed, but the disparate terminology threatened to throw a wrench into the data warehouse. "Data was going through all kinds of transformations, and what was accurate for a line of business might not be accurate for the enterprise," Komar says.

That realization spawned a massive campaign to standardize data across the various LOBs, orchestrated by Komar with the support of Prudential’s line-of-business CIOs and its corporate CIO. This meant garnering consensus on naming conventions for what’s now close to 3,000 terms describing things like customer, policy and claim. "Each LOB had different product codes, but we had to have agreement for the enterprise warehouse," Komar says. During months of working lunches, Komar’s team assembled a committee of data experts from the various LOBs and got input on how core types of customer information should be labeled and modeled. The data SWAT team also appointed managers to ensure that the new standards were followed.

Related:
1 2 Page 1
Page 1 of 2
Survey says! Share your insights in our 19th annual State of the CIO study