by Andy Hayler

Lots in store for warehousing

Feb 21, 2011
IT StrategyTelecommunications Industry

Some markets which appear to be mature can suddenly become exciting once more. One of the earliest mainstream enterprise applications was the database, and indeed I started my own IT career as an IMS database administrator.

But once the relational database became widely accepted there was only a brief period of competition before the market was carved up between Oracle, IBM and Microsoft.

The design emphasis of these relational databases had primarily been for online transaction processing (OLTP) systems. For a while there was a flurry of specialist databases that offered improved performance in analytic situations, where the workload is quite different from the multiple update world of OLTP.

However even here, only Teradata managed to really carve out a successful niche. Yet in the last five years or so there has been a flood of new entrants to the market, some using quite different database designs from traditional ones. What happened?

For one thing the sheer scale of data being handled has increased radically. In 2005 the largest operational data warehouses were about 100TB in size. Five years on, a number of operational data warehouses are now at the petabyte level.

The sheer scale of data to be handled has pushed the traditional vendor technologies to their limits, and has forced people to consider new approaches. Although processing speed and memory capacities have continued to evolve rapidly, disk access speeds have not.

One approach has been to develop massively parallel processing (MPP) approaches where many servers operate in parallel and data is distributed between them, allowing many processors to work on a query at the same time. This approach was taken by Teradata and more recently by Netezza.

This combination of specialist software and hardware aimed at data warehousing has become known as an appliance, though the definition is a little blurry, as some appliance offerings — such as Kognitio — can operate in the cloud, so do not require hardware on site.

Another way to tackle large data volumes has been to reconsider the traditional row-oriented structure used by the main relational database vendors. By storing data in columnar form rather than in rows it is much easier to gain significant data compression rates.

There is a price to pay for columnar storage if you need to load large volumes and access individual rows for update, but for data warehouse situations, where access is mostly read-only once it is loaded, columnar structures can offer significant performance advantages.

Pioneered by Sybase, this approach has been taken up by market newcomers such as Vertica and ParAccel.

But not all vendors have aimed for the giant warehouses; the same approaches have made it possible for some vendors to offer appliances that target mid-sized data warehouses at a dramatically lower price point than the traditional approaches have delivered.

However, the sheer inertia of the deployed applications and all the skills invested in existing technologies means that the newer vendors have to work harder on easing the migration from traditional database deployments.

These different approaches have seen a shake-up in the database market. The early success of Netezza has caused a flood of venture capital backing for alternatives.

Established vendors have reacted either by bringing out their own appliance offerings, such as Oracle with Exadata, or by purchasing one of these newer vendors (Microsoft bought DataAllegro) or both — IBM has its own appliance offering, but bought Netezza for good measure.

Sheer size of data has not been the only issue — the need to analyse large volumes of data in something close to real time has allowed further specialisation. For example an online poker company found that its fraud detection algorithms were unable to keep up with the volume of data coming in, until it deployed Aster Data’s technology which specialises in compute-intensive analysis of high-volume data.

As social networking sites build up vast amounts of data, the need to analyse this has spurred interest from other vendors. In particular some vendors have added support for Hadoop, an open-source software framework for supporting data-intensive distributed applications; in some cases this can offer major performance advantages over traditional SQL.

All these developments — appliances, the acceptance of columnar, the use of computationally intensive approaches like Hadoop — have contributed to a vibrant database market associated with data warehousing, which just a few years ago looked like a mature market: dull, with a few big vendors slugging it out but with little innovation.

Further technology advances such as solid state drives, the increasing desire for near real-time analysis and the inexorable rise in the volumes of data that organisations need to handle, promise to keep things lively in the data warehouse market for some time.

Andy Hayler is founder of research company The Information Difference. Previously, he founded data management firm Kalido after commercialising an in-house project at Shell