Given the strength of SQL Server in business, you might be surprised to learn that Microsoft has spent the last five years building a distributed NoSQL database – until you remember that services like Power BI, Bing and the Office Web apps face the same challenges as services like Netflix. They’re problems more and more enterprises have to deal with too: the deluge of data, the demands of mobility and the need for low latency even though you’re relying on cloud services.
That’s why Microsoft’s Dharma Shukla, who previously built key technologies like Windows Workflow Foundation (and worked on both Live Mesh and the never-shipped Courier tablet), has been developing Microsoft’s global-scale distributed database since the end of 2010.
As a cloud PaaS service, it avoids much of the complexity of configuring NoSQL databases yourself, and DocumentDB can even run MongoDB applications without any changes, as the service now exposes MongoDB APIs. That gives DocumentDB customers a way to try out DocumentDB apps on premise – and it also gives them a fast way to bring MongoDB apps to the cloud, as well as a way of avoiding for some of the security problems that MongoDB users have wrestled with. You’ll soon be able to bring applications from three other NoSQL systems – Cassandra, DynamoDB and HBASE.
“The thrust is that we want to build a planet-scale database for developers writing global distributed applications,” Shukla says. “It’s for Web applications, for mobile applications, for IoT applications, for those who want global reach. If you’ve written a mobile app and you have employees across the globe, and you want users to access data in real time, with single-digit read latencies in the region closest to the users. If you have users in Hong Kong who want to access data that’s being written in Seattle, but with predictable latencies in Hong Kong. If you want throughput and storage that scales elastically, if you want global reach and you also want the power of SQL. We scale like a NoSQL storage system and yet we offer SQL as a query language.”
Scale plus SQL-like queries
Relational databases were designed for what Shukla calls the workloads of the last 20 years. “The trend since the early ‘90s was that reads from the database used to dominate writes to it. But the world has changed. You’ve got IoT devices generating signals, connected cars generating information like the temperature of the engine. Information production is far more massive. You’ve got lots of data generated worldwide at a very high rate and the writes dominate database activity. SQL is amazing at serving queries but the rise of NoSQL is the rise of writes.”
NoSQL databases are optimized to handle that level of writes, so they scale well. What they don’t offer is rich queries when it comes time to read out of the database.
“What you really need is a write-optimized database engine that can sustain large, rapid writes – but still serve queries,” says Shukla. Unlike other NoSQL approaches, he claims you don’t have to choose between scale and powerful queries. “We decouple throughput from storage. We let you elastically scale out throughput and storage independently from each other.”
You can choose more throughput when you need it. “You can say ‘between 9 and 10 o’clock, I expect really high traffic on my website so I want to provision a million writes a second and for the rest of the I want to reduce it back to a hundred writes a second.’ You can change the throughput dynamically and change the storage requirements dynamically and we take care of all the sharding for you on the service.” You can also scale out one region but not another. “You can say, ‘I expect high usage in Hong Kong at this time of day, compared to Seattle and New York.”
Not having to give the service details of your schema takes away one of the barriers to frequent deployments. To deploy updates to a Web service every week, you’d usually have to allow time to manage any changes to the fields used in your database, creating a new schema and dropping and creating indices, which means not processing any queries while that’s happening. With DocumentDB, Shukla says, “You can change your application and the data structures in your program and just ship those to the database without worrying about schemas or creating secondary indices.” That means you can iterate the apps you’re building more often with less work.
A new kind of consistency
Any distributed system needs to deal with keeping the distributed copies of the database consistent. What really makes DocumentDB stand out is that unlike other NoSQL databases, it offers more than the usual choice of either strong or eventual consistency.
Strong consistency guarantees that you always get the most recent version of what’s stored in the database, but it slows things down. Eventual consistency has lower latency, but it guarantees only that you’ll get the most recent version eventually.
“Today all the other vendors expose only two choices, strong and eventual. If you choose strong consistency you compromise on agility, and if you choose eventual consistency you compromise on the programming model,” says Shukla. “That gives you two extreme choices – and strong is not viable for cross-data center apps so people have to use eventual. The reason for using eventual consistency is that frankly, it’s the only viable model where people want high availability and low latency.”
DocumentDB adds two new consistency levels and more coming in the future (and you can change from one consistency level to another to find out what works best for your applications).
As well as the two familiar options, you can choose to prioritise maintaining the sequence of reads and writes within a session (rather than across the whole database), or making sure that you preserve the order of all your reads and writes and that reads only lag behind new writes by a guaranteed amount. These Session and Bounded Staleness consistency levels give you more choice about the trade-offs you’re making to get a distributed system.
Despite these new options being new, the vast majority of DocumentDB users pick them, Shukla says. “Any time that you’re using eventual consistency, if you use session or bounded staleness, you will get your app to be much more predictable, without compromising on availability or latency.”
Nearly three-quarters choose session consistency. “With session consistency, you get availability and latency that’s as good as eventual but you get a much more predictable programming model,” Shukla says. “The kind of apps they’re writing, most of them ae mobile applications, gaming or social applications.” It’s also a good fit for messaging, analytics and IoT apps.
“For Web or mobile apps, [things happen] in the context of some user session. You have a user who is logged in to the app writing data and storing it in a database – for them, session works beautifully. Bounded staleness is the next most popular. What we found is that’s it’s preferred by people who wanted strong consistency. You have an app that can’t work without strong consistency but you want to move it to be global. Session is too weak and eventual is, well, too eventual. The next best thing is bounded staleness.”
These options are what make DocumentDB unique and Shukla notes that these two new options have made the classic choices of strong and eventual consistency, which are usually your only choices, so unpopular that they’re “almost on the fringes,” used by only one or two percent of DocumentDB users.
The service will add a fifth consistency level later this year (it’s already in use inside Microsoft, and it will launch once it’s been tested in action, according to Microsoft).
Bounded staleness guarantees the order of reads and writes across the whole database. “It’s very good for writing a stock ticker, where you have one source generating data and others consuming it and you want a predictable read latency. Every time there’s a write on the West Coast, you want to read to lag no more than say 90ms, and no matter how many writes happen on the west coast, you want the exact order to be preserved worldwide.”
But the DocumentDB team began to wonder what other kinds of consistency could be useful. Suppose you say that in that 90ms gap between writes and reads, you allow writes out of order, “but then after that window is over, you want them in order again,” says Shukla. “That would give much better write latency – as good as session and eventual – and you would have much more responsive apps that still have this ability to preserve order without being stateless.” That would be useful for a publish and subscribe model, he says.
“You can publish in one data center and consume in others. The notification hub on Azure is a good use case for that. It’s for when you have a little window of tolerance within which being out of order is tolerable, but the low latency is so crucially important,” says Shukla. “By just making that tweak, there’s a new class of producer/consumer apps we feel we can enable.”
To make sure MondoDB apps running on DocumentDB get the benefit of the extra consistency levels, they will automatically use session consistency. “The default for MongoDB is unacknowledged writes,” Shukla points out. “The request doesn’t even go on the wire before the API returns success. In reality, that write hasn’t gone on the wire, it hasn’t replicated, it hasn’t committed on disk. That gives you no guarantees for availability, consistency or latency. Since we’re a service and we’re backing it with an SLA, we change the default to session and that gives you durability and low latency, with SLAs. The developer doesn’t have to change the app, the app doesn’t have to be changed at all to take advantage of it but the app starts behaving more reliably and consistently.”
DocumentDB may not be widely known, but it’s growing about 20 percent a month, Shukla says. Microsoft doesn’t give exact figures but he claims DocumentDB has “a larger paying customer base than any of the on-premises NoSQL databases including MongoDB, Basho, any of those vendors. MongoDB has 10 million downloads, but paying customers are a tiny segment.”
None of them match the features of DocumentDB, he believes.
DocumentDB also has the advantage of being a PaaS with clear SLAs, he says. “Even DynamoDB, which is a cloud service, doesn’t offer any SLAs. For enterprises, this matters; guarantees about data loss and corruption, latency, queries, available, making sure data throughput is honoured at all time. SLAs matter a lot and we’re the only ones who offer them.”
So far, Microsoft has focused on working with very large enterprise customers. Shukla describes them as “car companies, companies that make elevators, companies in manufacturing, companies selling devices. That’s why our SLAs are important, compliance is important, availability is important, and the fact that we’re in all the Azure regions and in the Azure government clouds.” It’s already used inside Microsoft by Xbox, Bing, Microsoft, HoloLens, the Office Web apps and several Azure services. The MongoDB compatibility is part of how DocumentDB hopes to appeal to a wider audience.
In fact, the name of DocumentDB is probably the least successful part of it: It’s a “document” database only in the sense that it stores JSON documents.
“The name is not truly representative of the capabilities,” admits Shukla. “The planet-scale capabilities, the elastic throughput, the low latency, the SLAs, the globally distributed consistency levels … those capabilities are not apparent from the name. We are going to fix that.”
DocumentDB isn’t just a competitor to other NoSQL approaches. Shukla talks about solving a recurring pattern of problems for cloud services on a globally distributed scale. It’s also giving developers a clear model of the trade-offs and guarantees you can make with distributed data storage that they can use to make development decisions, in much the same way RDBMS did for database programmers with isolation levels on a single-machine database.
“Think of HoloLens as a modern device or Web applications like Office Online, Outlook, Bing or the massive data generating applications like Windows telemetry,” Shukla says. “All these Windows devices after the launch of Windows 10 sending usage data and telemetry. The trends I see are to making applications globally distributed. Anyone who can write a mobile or IoT app today wants that app data to be accessed in different geographies so the data needs to replicate. That’s where all these things like consistency, availability and latency come into play.”