by Constantine von Hoffman

How Big Data Can Quickly Become Big Garbage

Apr 04, 20132 mins
Big Data

The bigger the data the bigger the chance of mistakes or inaccuracies. In that vein, a large database used by retailers to screen people accused of stealing from employers is identifying innocent people and could result in major lawsuits, according to blogger Constantine von Hoffman.

A story in yesterday’s New York Times has me thinking about Terry Gilliam’s great dystopia movie Brazil. In the film, the modern computer is never developed. Instead, data processing is handled by banks of automated typewriters. One day a fly falls into one of the machines and an arrest warrant supposed to be made out in the name of Harry Tuttle (the film’s Robin Hood character) is made out in the name of Harry Buttle, a nebbish who has never bothered anyone in his life. As a result Buttle is grabbed by the authorities and whisked away to a location so secret the government won’t even admit it exists.

The story in The Times is entitled “Retailers Track Employee Thefts in Vast Databases.” The databases are designed to ensure that people who have stolen from their employers in the past never get a job in retail again. You might expect companies to carefully vet the data provided by these systems, given the significant impact on peoples’ lives it could potentially have.

“The repositories of information, like First Advantage Corporation’s Esteem database, often contain scant details about suspected thefts and routinely do not involve criminal charges. Still, the information can be enough to scuttle a job candidate’s chances. … But the databases, which are legal, are facing scrutiny from labor lawyers and federal regulators, who worry they are so sweeping that innocent employees can be harmed.”

One example:

“Kyra Moore, then a CVS employee, was accused of stealing: ‘picked up socks left them at the checkout and never came back to buy them.'”

The use of background checks in hiring is a wise move by employers, and providing these checks is a rapidly growing business. But even some of these providers have their doubts about the data banks.

“That is not a product that we sell, because I think it’s a product fraught with risk and inefficiency,” said William Greenblatt, the chief executive of the background-check company Sterling Infosystems, in The Times story.

So in the mad rush to adopt big data – or any other buzzword, remember GIGO: Garbage in, garbage out. Or, to put it another way, trust but always verify.