A data quality conundrum

The data quality market is a remarkably fragmented one – my company tracks over 60 vendors offering data quality solutions in one form or another.

Most of the data quality industry has concentrated on one specific type of data: names and addresses. It is easy to see why – every company has to deal with names and addresses of customers and suppliers. These are subject to inherent change as people move house, and customers often change names when they get married.

In addition to this there are plenty of opportunities for confusion – will the call centre operator record my name as “A. Hayler”, “Andy Hayler”, “A.D. Hayler” or “Andrew Hayler”, even assuming that they type the name in correctly? Large name and address files will frequently have 20 per cent error rates, and in one project that I was involved with many years ago a data clean-up exercise reduced the database of business customers from 20,000 to 5000 once all the duplicates and out-of-date entries were removed.

Because companies sent out marketing literature and bills by mail, there is clearly a real cost if 20 per cent or more of your addresses are wrong, quite apart from any irritation that customers may have when receiving multiple communications from different parts of the same company. Marketing costs may be unnecessarily high, and things get worse if the problem extends to the sending of actual deliveries and invoices. Consequently an industry has grown up to provide software that helps companies tackle this problem.

Data quality software typically uses a mix of algorithms to look for typing errors, while providing dictionaries of common variants on names: “Andy = Andrew”, “William = Bill = Wilhelm” for instance, in order to spot likely duplicates. In the early days such software was applied in batch to check files after the event, but these days it plugs into other transactional applications to try and spot errors before they happen. For example an account manager might try and enter a new client only for the software to point out that the account, perhaps with a slightly different name, may already exist at the same address.

1 2 Page 1
Page 1 of 2
Security vs. innovation: IT's trickiest balancing act