Cooperation the Key to Clean Data
Cleaning dirty data is not just a matter of mastering the technical challenges. It requires making sure your staff is working closely with the business every step of the way.
As Law points out, however, technology only goes so far. The 12-man TCP project succeeded in large part because team members at headquarters worked closely with members in the British Army, Royal Navy and Royal Air Force, who made sure flawed data actually did get cleaned and organized activities such as relabeling inventory on the shelf. So far, Nettle says, the cleansing project has cost $11 million over four years, and has saved the Ministry of Defence $36 million.
Define the Rules in Advance
While the British defense ministry is still cleaning up its data warehouses to generate a more consistent view of military supply items, commercial companies are employing much the same technology to develop an enterprisewide view of their customers. In some cases, the demand for consistently clean data comes from the customers themselves, who want to see how their business is performing in real-time. For example, in these economically constrained times, the corporate customers at Carlson Wagonlit Travel, one of the largest travel agencies in the world, are eager for good quality data on exactly how their travel and expenses budgets are being spent. Indeed, building a data warehouse that can deliver such information has become a competitive differentiator in the industry, says Jay Vetsch, senior director of information delivery at Carlson.
The task for Vetsch and his team was daunting. With annual sales of $10.5 billion and operations spread over 140 countries, the agency has high data volumes: 14 million airline tickets per year, 12 million hotel nights booked every year and so on. While the raw number of transactions per day (around 60,000) is doable, each record often equates to a trip with several flights, hotels and rental car reservations. Thus, the record size is massive, around 400 fields.
Worse, the data must be extracted from a number of different back-office systems spread across the business. What’s more, the data is subject to the inputting vagaries of the front-office operators in those 140 countries—not just human vagaries, but also differences in legal, tax and accounting regulations. And from the point of view of the people generating the data, Vetsch’s task is not mission-critical.
"You have to remember that the information is being generated for the purpose of getting a traveler a ticket—not for an MIS system to provide reports to clients," he says.
As a result, the data can contain errors—an invalid supplier code, client code or a fare discrepancy—not major enough to prevent a ticket from being issued, but flawed enough to foul up an analysis. Vetsch relies on software that acts as a gate guardian to the data warehouse. If a record meets defined data quality criteria, it’s allowed to proceed. If it doesn’t, it’s kicked back to the originating office for correction.



