Data Trends: Petabyte and Beyond
"I have a 67 billion row table," Davis says, "and I can do a sort across six months of that table in three seconds." Backing up and restoring became easier for the same reasons.
A second approach to leveraging the speed of electronic memory is to build algorithms that can grade data by importance. The most critical pieces get loaded into memory, while the rest goes to disk-based systems where lower performance levels (and therefore lower operating costs) are tolerable. Dave Harley, chief designer of London-based BT Group, is experimenting with this approach with software from Princeton Softtech. While the application has been used only in the system that supports IT management for its employees in asset management and fault tracking, results have been such that Harley expects to see this so-called active archiving adopted throughout the company. "The key factor is keeping the most critical database as small as possible," he says. "It’s quite a new idea."
StorageNetworks of Waltham, Mass., is also using this approach to manage the 1.5 petabytes acquired through its storage services arm. CEO Peter Bell says that 70 percent of the data stored on an average system has not been looked at in the previous 90 days. If you make the reasonable assumption that the number of recent accesses is a dependable way to determine relative enterprise criticality, then loading just the most used data into memory can go a long way to delivering acceptable performance where it is needed. Bell adds that the critical issue in managing petabyte-scale volumes of data is developing data classification systems that balance power without introducing excessive single point-of-failure risks. (If a computer managing a petabyte goes bad, the damage it can cause is breathtaking.)
On the other hand, Len Cavers, director of technical development for Experian, an Equifax competitor, believes that in the long run centralized solutions will not scale adequately. He argues that as backbone bandwidth speeds increase and data standards get defined and distributed, companies such as his will find it increasingly practical to "leave the data" higher and higher up the value chain. In that world, the networks would carry not raw data (which wouldn’t move) but queries and intelligent indexes so that querying systems know which to connect to. Experian is now involved with an active development program with its partners over how to use XML and Web services to frame and respond to queries and generate indexes.
$firstKeyword



