Cooperation the Key to Clean Data

Cleaning dirty data is not just a matter of mastering the technical challenges. It requires making sure your staff is working closely with the business every step of the way.


Thu, July 01, 2004

CIO — In the early hours of March 20, 2003, British soldiers, sailors and airmen joined U.S. forces in the invasion of Iraq and the toppling of Saddam Hussein. Thus far, they have played a vital role in rebuilding Basra and the critical Persian Gulf port of Umm Qasr. Massive shipments of military materiel were essential to their success, and basically, anything that wasn’t a vehicle, live ammunition or fresh provisions (which have different supply lines) began its journey to the Gulf from England’s military warehouses. In the few weeks prior to the invasion of Iraq, these depots sent by ship or air 3,169 20-foot shipping containers to the Gulf, along with almost 22,000 3-foot pallets.

Getting these shipments to the Gulf was a logistical nightmare that would have been far more fraught had the British defense ministry not embarked four years ago on a $10.8 million effort to pull together three separate supply chains: This involved reconciling some 850 different information systems, and integrating three inventory management systems and 15 remote systems.

The biggest foe in this massive integration effort was not Saddam Hussein, but dirty or disparate data. To one system, stock number 99 000 1111 was a 24-hour, cold-climate ration pack. To another system, the same number referred to an electronic radio valve. And if hungry troops were sent radio valves instead of rations, the invasion and rebuilding of Iraq wouldn’t have gone very far.

Dirty data has long been a CIO’s bugbear. But in today’s wired world, the costs and consequences of inaccurate information are rising exponentially. Muddled mailing lists are one thing, missing military materiel quite another. Throw in the complications arising from merging different data sets, as in the aftermath of a merger or acquisition, and the difficulties of data cleansing multiply. For this article, we interviewed seasoned data-cleaning veterans from organizations as diverse as the British Ministry of Defence, the U.S. Census Bureau and Cendant, a real estate and hospitality conglomerate. But the lessons learned contain two common themes: How to surmount the technical challenges of cleaning data, and how to align IT staff with the business side to ensure that the task gets done right.

Know Your Enemy

When Britain’s defense department began its data-cleaning project in early 2000, it faced a huge task, says Lt. Col. Andrew Law, head of The Cleansing Project. (It just so happens that the acronym TCP is also a well-known British brand of antiseptic.) The department’s IT team was using three main systems to sort through 1.7 million records, which each had literally hundreds of attributes. Each record referred to an item that troops might require, and many of these items were to be dispatched from the ministry’s widely dispersed warehouses in Bicester, England, and other locations. (The Bicester warehouses are far apart because they were built in 1942 with the idea to make it hard for German bombers to deliver a knockout punch.)

Continue Reading

For DBAs and developers who are familiar with Oracle solutions and want to learn about NonStop SQL/MX, this whitepaper provides an overview of the similarities and differences between the two products-with a specific focus on implementation.
See how the Nebraska Medical Center implemented a SQL solution to make information more readily available to streamline operations, improve patient care and facilitate medical research with an enterprise solution running on HP NonStop servers.
This whitepaper offers a detailed look into the fundamentals of HP NonStop SQL solutions. See how this system delivers unprecedented levels of application availability with fail-safe data integrity and meets the needs of enterprises with large-scale business critical applications.
CIOs know that Information technology is the foundation for business competitiveness and essential to regulatory compliance. Increasing IT complexity and tight budgets create a dilemma for CIOs.
Images captured from conventional surveillance systems are often very poor. But recent advances in digital imaging technology, computers, and networking hardware make it possible to usher in a new level of performance. With a system that leverages the latest technologies and that is designed from end-to-end with the goal of capturing and preserving image quality, the Avigilon High Definition Surveillance System achieves unmatched performance.
The HP Business Decision Appliance is a solution optimized for Microsoft SQL Server 2008 R2 and Microsoft SharePoint Server 2010 and designed for enterprises that want to provide business intelligence (BI) capabilities in a pre-configured single enclosure.
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and disaster recovery and support considerations.
Please join guest speaker IDC Analyst Carl Olofson as he discusses Enterprise Data Center challenges and why database consolidation is important and necessary. And hear from HP expert Joe Sullivan, who will discuss the HP Database Consolidation Appliance and how it addresses enterprise industry challenges. Joe will provide an overview of product architecture and details on how the appliance enables companies to build their own private cloud. This webcast will provide the latest information for simplifying your data management needs while reducing costs.
Fact: The demand to respond faster and with greater insight to business demands, based on data, is increasing. Fact: More organizations are turning to business intelligence (BI) and data warehousing for insightful decision-making.
Optimized for Microsoft® SQL Server 2008 R2, the preinstalled, pretuned HP Enterprise Database Consolidation Appliance simplifies database infrastructure management, improves resource utilization, and reduces costs resulting in exceptional levels of return.
The first appliance in the industry which consolidates and manages thousands of databases, integrates hardware, software and support and is scalable to meet your changing business needs.
Latitute is the only platform that combines the 3 essential capabilites for agile BI. View now to learn more
Newsletter Sign-Up »

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all Newsletters | Privacy Policy
Resource Center