Cooperation the Key to Clean Data

Cleaning dirty data is not just a matter of mastering the technical challenges. It requires making sure your staff is working closely with the business every step of the way.

PAGE 2

Law’s mission was to review all the data, but he had to concentrate his team’s energies on cleaning six critical data fields: the NATO item identifier, the NATO supply classification, the unit of issue, the supplier code, the packaging code and the hazard code. These six fields were chosen based on which ones would have the biggest impact on the supply chain if they were wrong.

"The first step was to identify homonyms and synonyms," says Paul Nettle, manager of data cleaning for TCP. Homonyms, he explains, are two or more different items with the same identifier, such as rations and radio valves. Synonyms are the same items with more than one identifier—the same radio valve kept in two places in a warehouse under two different numbers, for example.

"Synonyms are merely inefficient," Nettle observes. Overstocking and overbuying result from such data mistakes, rather than troops being shipped the wrong gear.

Next, the IT team employed data-profiling software to crawl though the data, checking it for valid NATO numbers. The troubling finding: 119,000 numbers (about one in 10) weren’t valid. The radio valve, it turned out, was a valid NATO part number, but the rations came from a satellite system where nonstandard rules had been used. Every one of them had to be sent to a NATO office in Glasgow for codification, and then corrected in each system in which it occurred. Nettle and his team also discovered they had quite a bit of relabeling to do at the depot, since much of the inventory sitting on the shelves was now incorrectly labeled.

The next step was "fuzzy matching," using software to look for duplicates and errors introduced by keyboard entry. "The ability to ignore [minor mistakes in] punctuation and figure out when a 3 had been erroneously substituted for an 8 was important when dealing," Nettle says. Such numerical errors, after all, could change the entire meaning of the text, while punctuation mistakes merely provided Nettle’s team with much needed amusement.

By August 2001, they had completed the relatively easy (if time-consuming) task of examining item identifiers to see, for instance, if an item held the valid NATO number. Now they had to find a way to correct the other data fields. Here, the challenge was more difficult. For things such as unit-of-issue labels, packaging codes and supplier details, hard and fast rules to tell clean data from dirty data didn’t exist. For example, supplies of aircraft oil: A military unit in the Gulf might order 250 liters of oil, expecting 250 one-liter cans—only to receive 250 separate 250-liter drums of the stuff. The reason? On the Royal Air Force system responsible for ordering the oil, 250-liter drums, not one-liter cans, were the unit of issue. Neither label was technically an error, but clearly, such inconsistencies could quickly cripple a supply chain. To make sure such a disaster would never occur, the TCP team turned to a data-profiling tool, which highlighted errors and inconsistencies in the various codes. The software provides easy-to-understand, computer-generated diagrams to spot unusual data formats that could be erroneous.


Loading...
Applications MarketSpace
Practical Approaches for Securing Web Applications
Enterprises understand the importance of securing web applications to protect critical corporate and customer data. What many don't understand, is how to implement a robust process for integrating security and risk management throughout the web application software development lifecycle. Learn more »
An Executive's Guide to Web Application Security
Since so many Web sites contain vulnerabilities, hackers can leverage a relatively simple exploit to gain access to a wealth of sensitive information, such as credit card data, social security numbers and health records. It's more important than ever to examine your Web application security, assess your vulnerability and take action to protect your business. Learn more »
Web Application Vulnerabilities
Security managers may work for midsize or large organizations; they may operate from anywhere on the globe. But inevitably, they share a common goal: to better manage the risks associated with their business infrastructure. Increasingly, Web application security plays a significant role in achieving that goal. Learn more »
Using ERP To Gain Competitive Advantage in a Tough Economy
For midsize enterprises, now is the perfect time to invest in a significant IT expansion - despite the economic climate. Learn more »
Why BI is Ripe For Businesses of Any Size
Oracle's range of offerings to mid-size and emerging companies reflects its vision that BI and EPM solutions can be embraced by companies of all sizes. Learn more »
Oracle Accelerate
Ovum has been following Oracle's Accelerate program over the last couple of years because they thought it is a smart strategy for penetrating the upper mid-market. Learn more »
The New Age of ERP
Not only can small and mid-sized companies reap the renowned ERP benefits of greater agility, increased business visibility and measurable ROI. Learn more »
 
SPONSORED LINKS
 

CRM Built for IT: The Executive Guide to Selecting CRM that Meets IT Needs

ROI of Application Delivery Controllers

White Paper: 4 Customer Service Myths

White Paper: Improve Agility with Operational Responsiveness

Removing the Barriers to IT Governance: How On-Demand Software Changes the Game

Cloud Computing--Latest Buzzword or a Glimpse of the Future?

A Balanced Approach to an Application Development Platform

Adobe® LiveCycle®solutions for intuitive user experience

10 Ways Excel Drives More Value from Your SAP Investment

What's New in SOA Suite 11g?

Unleash the Power of Java with Oracle JRockit Real Time

SOA Best Practices and Design Patterns

Application Grid: Ideal Platform for IT Consolidation

Ready to virtualize tier one applications? Check your virtualization maturity.

Learn how to provide complete Business Service Management.

Increase ROI of Your Application Portfolio

Return on Information: Google Enterprise Search pays you back. Get the facts.

VMware. The source for Business Infrastructure Virtualization.

ShoreTel tells businesses to untangle from competitors' complexity and turn to its brilliantly simple UC solution

See how AT&T can help protect your network.

Streamline IT Costs. Boost Performance with WAN Optimization.

Build your 1st app FREE with Force.com

TDWI checklist helps define data readiness for analytics. Download report.

eZine: A Roadmap to Reducing IT Complexity

Reduce risk, gain agility. See how Progress can help your business.

What's Next for Enterprise Resource Planning?

Gartner Magic Quadrant, Application Delivery Controllers 2009

White Paper: Managed Security for a Not-So-Secure World

SharePoint - Unchecked growth of content is unsustainable.

Focus Under Pressure: Why IT Governance Becomes Mission-Critical in a Down Economy

Should Your Email Live In The Cloud? A Comparative Cost Analysis

Adobe® LiveCycle® solutions for business process automation

Architecting Business Intelligence Applications for Change: The Open Solution

Increase UPS efficiency without sacrificing protection.

Unlocking the Mainframe: Modernizing Legacy System to SOA

State of the Data Integration Market

Enhance Customer Loyalty through Higher Responsiveness

Achieving Business Agility with Application Grid

Seven Ways ITIL Can Help You in an Economic Downturn

Four steps to populate your CMDB.

"Enterprise-Proven" is the Prerequisite for Enterprise SaaS Portal Solutions

AT&T Synaptic Storage as a Service. Expand on demand

Trend Micro ranked #1 against real-world malware. Read more.

Webinar: Jump-start your in-house e-discovery with Ringtail QuickCull from FTI Technology

Top Five CIO Challenges

Read the RSA report: Security for Business Innovation

64-page prescriptive guide to security, compliance, and IT operations.

A Clear View Toward Virtualization

Virtualization Technology as a Business Solution

The rules of infrastructure management just changed.

 
 
RESOURCE CENTER