by Barry Morris

Simpler applications and smarter databases, Part 2

Aug 10, 2017
Cloud ComputingData ManagementEnterprise Applications

Here's a more detailed look into the trade-offs represented by NoSQL.

8 database
Credit: Thinkstock

In my last blog post, I talked about the emergence of NoSQL as an antidote to the deficiencies of traditional SQL RDBMS products, and I concluded with the question about where the data management industry is going given the current environment, and whether we’re really addressing the needs of senior technology leadership.

Let’s start with going a bit further into the trade-offs represented by NoSQL.

KISS and the cloud

The NoSQL movement is not merely a slam against the traditional RDBMS. NoSQL seeks to offer solutions, solutions that address the list of needs I outlined in my last post.

The central design theme of NoSQL database systems is semantic simplification.  Many of the following things are finding their way back into the systems today, but early NoSQL systems pursued the cloud-style requirements by avoiding schema design, normalization, consistency guarantees, and server-side processing complexity.  The idea is to trade powerful traditional database capabilities for powerful cloud-style database capabilities.  

Simplifying a database system is the easy route to elastic scalability.  File systems scale arbitrarily because they are simple, independent key-value stores with very few guarantees of anything. What Jack does on his laptop has nothing to do with what Jane does on hers. Obtaining more laptops linearly increases storage capacity, I/O throughput, and user concurrency. It is the complexities (such as maintaining files consistently across machines or maintaining formal relationships between files on different machines) that would limit the scalability of the overall system.

The easy way to autoscale is to trade semantic power of database services for elastic scalability.  

Between Scylla and Charybdis

In this new world, the CIO is presented with two choices:

  1. Stick with traditional database technologies, with all of the proven power of those systems but with limitations on elastic scalability (see my post on database elasticity as the primary challenge for digital transformation), or
  2. Move to a semantically simpler data management solution that scales elastically, and accept that it is more limited in richness of services.

The SQL RDBMS is the gold standard of enterprise data management. Organizations have established toolchains, practitioners, operational models, security policies, business continuity processes, data integration and warehousing strategies, and vendor relationships. Above all, the systems deliver trusted transaction models and standard data models that decouple strategic corporate data from the applications that interact with it.  

On the other hand, moving away from the SQL RDBMS can lead to important benefits in terms of cloud-style deployment. Doing so involves a significant institutional learning curve, business risk, and a much more limited ecosystem. There is a smaller pool of application development skills, and reduced database SLAs (specifically ACID transactions) imply increased application complexity, skills scarcity and lifecycle cost.

As relates to this choice, CIOs have had to be pragmatic: The common approach is to stick to RDBMS-based solutions if at all possible, even on the cloud, and adopt more cloud-friendly solutions in situations in which the RDBMS falls short. The latter situation is increasingly the case as elastic cloud deployment drives strategic direction.

Conservation of intelligence

At a system architecture level, this choice between the traditional RDBMS and the non-SQL solutions is a choice between smart databases and smart applications.  In simple terms: Where do you want to put the intelligence?

There are two architectural elements to the question: SQL and ACID transactions. One enables powerful server-side processing, and the other provides a simple model of data guarantees that enormously simplifies applications.  

A central question in relation to data management is whether you want a server-side language. Without one, your application has to grab lots of data across the network and do all non-trivial data manipulation on the client side. The benefit is that the server is simple and may be able to scale-easily. The downside is that the client has to do much more work, which is costly, error-prone, and probably duplicates what other clients are doing. There are also negative consequences relating to latency and consistency/concurrency, though these are a little less obvious.

A server-side language can enable a smart server that does powerful processing on behalf of clients. Likely benefits include simplification of client code, as well as performance of non-trivial data manipulation processes. Parenthetically, it might be noted that there are good arguments for the server-side language to be a) independent of application languages, b) declarative in nature, and c) set-based – in other words something pretty close to SQL. The downside of a smart server is that it is more complex, and furthermore the complexity could limit other attractive features (like elastic scalability).

The other architectural element relating to client-side vs. server-side intelligence is ACID transactions. You sometimes hear people say that ACID transactions are only really necessary for a small subset of applications, but that assertion misses the point.  It is not about whether transactions are necessary or not – it is about application simplification.  

When a database system provides formal guarantees of Durability (the ‘D’ in ACID), for example, an application does not have to deal with exceptions relating to the data perhaps not being committed to durable storage.  Analogous, and more serious, observations apply to the other dimensions of ACID (Atomicity, Consistency and Isolation).

Applications built on SQL/ACID services can be simpler, cheaper and more reliable, and developed by lower-skilled teams. But there are times when this is a poor trade-off.  Cloud-style infrastructure is the future, and if application complexity is an unavoidable cost then it may make sense to pay it.

You might ask whether there is a way to have both. Can we have smart databases that can scale on the cloud?  

In my next post, I’ll tell you why I think the answer is yes. But in the meantime, what do you think?