The database world has become a lot more interesting of late. A decade ago it seemed a pretty stable, almost dull, place: relational had won out over alternative approaches, and Oracle, IBM and Microsoft battled it out for market share of essentially a mature market.
Over the years there had been interesting diversions into other styles: enterprises had briefly flirted with object databases in the early 1990s but these fell by the wayside.
Traditional relational databases had been designed for OLTP systems, and attempts to produce different approaches for analytics mostly fell under the steamroller of the big relational vendors, with enterprises proving a conservative lot when it came to storing their data. Few now remember databases like Model 204.
At the turn of the millennium you could have been forgiven for thinking that the database market was settling in for a peaceful middle age.
You would have been wrong.
What has happened over the last decade or so is the dawning realisation that the steady advances in processing power and disk storage access speeds have simply not kept pace with the explosion of growth in data volumes.
Database expert Richard Wintertracks the world’s largest production data warehouse in regular surveys: in 2000 the largest data warehouse in his survey was a few terabtyes (TB).
By 2004 this had reached 30TB and was growing exponentially.
By 2008 Teradata had five customers with warehouses over 1000TB (a petabyte) in size.
Traditional row-oriented database structures, ideal for high-concurrency OLTP workloads, started to creak under the pressure of this tidal wave of data.
This has led to the rise of columnar data storage, pioneered by Sybase, which essentially turns the database on its side and works especially well for analytic workloads, allowing a lot of data compression, though with a penalty in data loading and concurrent access.
Initially scorned by the traditional vendors, this approach has become more mainstream, with row-oriented databases now allowing a hybrid of row and column orientation depending on workload.
Sheer processing power is brought to bear through more use of MPP techniques in many of the newer database appliances.
Innovative hardware approaches have also been applied, for instance by Netezza with its FPGA accelerator. Solid state disks and cheaper memory have allowed more modest analytic volumes to be tackled differently, one example of the latter being SAP’s HANA product.
Unstructured data, like documents and web pages, have brought entirely different approaches, the most popular being the Hadoop initiative.
These bring with them new programming and file approaches but are powerful for certain workloads.
Database vendors have been scrambling to integrate such approaches in their own products, the Aster SQL-MapReduce being a pioneer in blending the structured data and Hadoop worlds.
There is even the somewhat controversial NoSQL movement, spawning a range of specialist databases that move away from traditional schemas through either XML databases or other storage structures.
All this presents a bewildering picture to enterprise customers struggling with the data explosion.
For me, the religious arguments between row and column orientation advocates are unhelpful: it is a technical argument about how best to address to a business problem and are confusing the market.
Imagine designing an ideal database optimiser.
It would constantly monitor the usage patterns of the database, identifying hot-spots and shifting the most commonly used data to whatever was the optimal storage mechanism: perhaps to memory or solid state disk, or deciding between a row or column approach depending on the workload characteristics.
Crucially, it would do all this without requiring manual intervention by a human database administrator.
The vendors of traditional databases can no longer rest on their laurels, but need to put more effort into improving their optimisers, hiding such issues from application programmers behind an intelligent, self-managing layer of software that reduces the bits and bytes arguments over rows versus columns to an optimiser decision.
The newer database vendors can do the same, though they also need to bring their products up to a similar level of maturity in reliability, security and administration which the older databases have built over the years.
May the best optimiser win.