Data Trends: Petabyte and Beyond
Searches conducted on large volumes of data naturally generate more errors. At some point, the number of errors so overwhelm the user’s ability to cope that the system essentially becomes useless. The only solution is to rewrite the search programs so that they make fewer errors, and no IT development task is harder to do predictably than boosting the IQ of computer programs. Finally, according to John Parkinson, CTO of Cap Gemini Ernst & Young for the America’s region, even the cost of the core overhead tasks (such as buffer management) typically grow faster than linearly.
One school of thought is that transition to petabyte levels is just not worth it.
Faisal Shah, cofounder and CTO of Chicago-based systems integrator Knightsbridge Solutions, says that data quality naturally drifts down as more space opens up in the corporate attic, in part because you are now saving things you used to throw away. Shah believes that companies will be better off spending their now-restricted IT dollars on trying to extract more intelligence from current data stores rather than piling up haystacks with fewer and fewer needles hidden in them.
Petabyte Solutions
Other observers are betting that new technologies will be able to keep those penalties under control. Like many IT problems, the solutions being explored fall along the spectrum of centralized to distributed.
Ron Davis, senior IT architect of Equifax, the Atlanta-based consumer data company, is working with a centralized management solution from Corworks. Equifax’s business is to buy raw data from state agencies or directory companies, and turn it into information products. Equifax wants to control the data it buys for as long as possible as it never knows what a new product design might call for or when. While the data could, in theory, be left with its suppliers, Davis’s experience is that retention policies and practices vary too widely over Equifax’s 14,000 data sources to make such dependence practical. He believes that at least over the short run, companies near the end of the value chain will have to take on the responsibility of archiving raw data. Shouldering this responsibility has put Equifax on the road to becoming a petabyte company, and it has forced Davis to search for an architecture competent to deal with the petabyte problems of cost, error and time.
Corworks’ basic idea is to beat the time penalties inherent in handling large volumes of data by loading it into electronic memory. This seems counterintuitive, rather like making a quart easier to drink by squeezing it into a pint, but the feat is done by stripping out the structural data (such as converting everything into flat files), compressing the result, and then relying on fast processors to decompress and restore the data structures only as needed. In other words, just-in-time logic.
$firstKeyword



