If you take a step back for a moment and think about airplane flight, it turns out that something rather extraordinary is happening. Most of the time the plane is being flown by an autopilot and the pilot is actually kind of a “meta pilot” – a minder that watches to ensure that the autopilot is not doing anything dumb. And every year, millions of us entrust our lives to this system – we’re not only okay with it, we’re in fact impressed that an auto-pilot can do that stuff so effectively. Against that backdrop, now consider how extraordinary it is that we don’t have computer software that can “fly” a data-center. Don’t bet that it will stay that way. It is changing, and the changes are going to have big consequences.
Behind the cloud: administration and management
Cloud users see the Cloud as a set of services, SLAs and invoices. That is surely the point: I click on my Netflix movie or fire up a server on AWS, and I pay the service provider for the privilege. Simple. From the cloud service provider’s perspective, things look very different of course. Behind the curtain, they have to arrange for dynamic provisioning of infrastructure resources, application services and administrative frameworks. They are hiding failures, demand fluctuations and maintenance activities. They are managing constantly changing authentication, authorization, and audit models. The calm surface of the pond belies the frenetic activity below the water line.
This is the clue to the importance of data center automation: Cloud data centers are not static environments, pre-provisioned to run a known, finite set of workloads to support predictable demand. Quite the reverse: They are highly dynamic environments in which everything is changing all the time. At scale. How do you manage tens or hundreds of thousands of servers with associated networks, storage systems, a wide range of software stacks, and administrative systems? And how do you get those components to deliver one-click services to impatient and demanding cloud users? The answer is automation – or at least it will be when we have figured out how to do it.
Service manageability for the data center
Major technology companies have been working on this for years, of course. A good example is Microsoft and their Autopilot system. In Microsoft’s words: “Autopilot is responsible for automating software provisioning and deployment; system monitoring; and carrying out repair actions to deal with faulty software and hardware”. They go on to say: “A key assumption underlying Autopilot is that the services built on it must be designed to be manageable”.
“Manageable services” – that’s the requirement. There is no point to an airplane autopilot if the autopilot can’t adjust the control surfaces (ailerons, flaps, rudders). One interesting question is about where we are on that journey in relation to the data center, in other words how close we are to having the data center autopilot connected to the data center “control surfaces”.
The main themes of this “service manageability” are 1) Decoupling, and 2) Software-based control. In headline terms, we need to define important services at various levels of granularity, isolate them from each other, and give them control and monitoring APIs. In some areas of the data center there is great sophistication about this.
Virtualization of servers has decoupled applications from physical servers, and there are very capable VM management systems to manage lifecycles and health of VMs. Containers take this considerably further, and include powerful elements around packaging and location-independence. As relates to networking, the progress of virtual networks has been rapid. The application-centric view of the network is analogous to the application-centric view of the machine it is running on: In both cases, the view is fake (in the best sense of the word), pretending to be the real hardware but in fact decoupling the application from the real hardware. In both cases you have API-based control and higher-level systems that interact with those APIs. The same is true of storage virtualization and software-defined storage. And the rise of hyper-converged systems at the server level follows a similar pattern: Decoupling of services and providing mechanisms for software-based control. So much of the “service manageability” is in place, or progressing rapidly. What is missing?
Beyond virtualization: Decoupling in the database layer
You may have figured out where I am going with this, given my interest in database systems. Here’s the point:
- You can have an API, today, that allows you to add or remove a VM to your application to support a surge or drop in concurrent users, and even allow an orchestration tool such as Kubernetes to add or delete the VM automatically when needed. Can you do that for your database system?
- You can have an API today that allows you to provision and configure a private network in an entirely automated fashion. Can you provision and configure a database that way?
- You can have an API today that allows you to move a set of running microservices to a different physical machine to allow maintenance of the current physical machine. Does your database have that kind of an API?
- You can have an API that reconfigures your storage system while it is serving live applications. Does your database system support that?
It’s hard enough to setup a database system to enable basic self-service usage. More useful, and reasonably expected, cloud APIs are mostly out of the question when it comes to databases.
The historical RDBMS design is so fundamentally antithetical to elastic infrastructure that no matter how clever your datacenter autopilot is, it just does not have access to useful “control surfaces”. And that’s because traditional database systems cannot perform the actions at the database level that would be expected in an elastic data center. No, you can’t just call an API to tell Oracle or SQL Server to add 10 more nodes to a running database. You might as well have a “Fly!” API on a grunting farm animal.
Introducing the software-defined database
The challenge around providing great automation APIs for databases relates to the first of the two themes above, namely Decoupling. The primary design for the major database systems derives from IBM System-R in the mid 1970s. And the key pattern in that design is tight coupling, especially tight coupling between memory and storage. But tight coupling is a more general pattern in these systems, other examples of which include tight coupling between schemas and storage formats, tight coupling between data records and memory pages, tight coupling between clustered nodes, and more. A modular and loosely-coupled design would better lend itself to automation, but the traditional RDBMS is a monolith. The next generation of database systems, so-called Elastic SQL systems, must address this. They must be designed to be modular, loosely coupled, composable and software-programmable. Maybe we should refer to Elastic SQL databases as Software Defined Databases, because from an automation perspective that is the central requirement.
In my next post, I’ll talk about how such a design change would lead to the “big consequences” I alluded to earlier – and what impact they’ll have. In the meantime, what do you think? Are we ready for an auto-piloted database?