Data Trends: Petabyte and Beyond
Cavers believes that petabyte-level data stores will force IT people to minimize the number of mass copy operations as much as possible. "This is a paradigm shift in the way people think about computing," he says.
The Petabyte Paradigm
Gerry Higgins, senior vice president of Verizon information processing services in New York City, points out that maintaining a petabyte of data raises distribution management issues in hardware as well as software. In the petabyte world, data is usually spread over thousands of disks. "Vendors always want to talk to me about how great their mean-time-between-failure numbers are. I tell them not to bother. All I’m interested in is what happens when there is a failure," Higgins says. "When you deal with so many disks, some are always crashing. I tell them that when you’re a petabyte guy like me, you have to expect failures."
Many observers think the transition to petabyte levels is going to introduce changes even more sweeping than those associated with previous leaps in storage. "Traditionally vendors have built standalone data mining engines and moved the data into them," says Winter. "But are you going to be able to move a petabyte around like that?" Winter foresees radical changes in engine architecture, probably involving breakthroughs in the engineering of parallelization.
"The whole notion of storage takes on a new meaning," says Scot Klimke, vice president and CIO for Network Appliance, a storage services vendor in Sunnyvale, Calif. "It starts to be defined less as simple retention and more as the struggle for information quality."
Perhaps the worst such issue is consistency. A petabyte of data is so big, and the quality of the information it contains is perforce so low, that it is bound to contain and create inconsistent information, which means that any petabyte-level system has to contain ways of detecting and resolving data conflicts.
Another issue is aging: Information quality varies, roughly, with age, but present systems are poorly equipped to track the age of material, especially material within a file. "I have five priorities for this fiscal year," Klimke says. "Two involve data quality."
Klimke argues that as the petabyte revolution picks up steam, the struggle to measure and manage data quality will increasingly define the CIO’s job. While he might or might not be right about this specific point, it’s clear that anyone exploring the petabyte world should bring a good map, watch out for booby traps and carry a rabbit’s foot for luck.
$firstKeyword



