Cooperation the Key to Clean Data
Cleaning dirty data is not just a matter of mastering the technical challenges. It requires making sure your staff is working closely with the business every step of the way.
Data from Europe, where the company has offices in most countries, is already being used on a limited basis to generate client reports. Company agents in North America and the remainder of the countries in which Carlson has offices should be able to generate such reports by early 2005. Vetsch declined to disclose the projected ROI for the cleansing effort. However, if good quality client reports have become the price of getting corporate business, then it’s a bold manager who’d argue that the investment was nothing other than the price of survival.
Buy-In from Owners of the Data
Similar to Carlson, Cendant—owner of car rental company Avis and realtor Century 21 as well as hotel chains Days Inn, Howard Johnson, Ramada and Travelodge—would love a single, enterprisewide view of all its customers. But five years’ work on building a data warehouse delivered virtually nothing. That’s because no one was using it. By now you can guess the culprit: dirty data.
"Basically, the data warehouse was being used for list generation by two people in marketing," says Vincent Kellett, senior director of data services who was hired in 2002 to see if the project could be revived. "Because of data quality issues, the project was dying on the vine."
To make the system viable, Kellett realized the company would have to throw out a bunch of hard-to-maintain custom code, spend money on cleaning up some truly horrible data and institute formal processes for data maintenance. Even basic procedures such as subscribing to the national change-of-address database maintained by the U.S. Postal Service had been overlooked by the project team. "They’d been so mired down in day-to-day problems that they just hadn’t gotten ’round to it," he says.
Data-cleaning software from Trillium Software was pressed into service. The database originally contained 132 million records, a number that was eventually boiled down to 90 million "that at least had a name and a street address," Kellett says. At each cycle of the data-cleaning process, his team formulated new rules, which were then subjected to a trial experiment to both detect duplicates and correct them. Further winnowing, by matching against the latest information on address changes, eventually reduced the number to closer to 80 million cleaned records that were loaded into the data warehouse.
When a customer checks into a hotel or picks up a rental car and a new record is created, the system asks: Do we know this person? If so, load any new information—such as change of address or phone number—and then update their transaction data with another stay or car rental. And that information is automatically integrated with the rest of the customer database.



