Facebook Open Sources Its Embedded Database Code

Facebook has released RocksDB -- an embeddable, persistent key-value store for fast storage -- to the open source community. The embedded database is intended to support applications that need low-latency database accesses.

Hot on the heels of Facebook's release of its Presto distributed SQL query engine to the open source community earlier this month, the social networking giant was back at it today—this time open sourcing RocksDB, an embedded, high-performance, single-node database engine.

"Every time one of the 1.2 billion people who use Facebook visits the site, they see a unique, dynamically generated home page," says Dhruba Borthakur, engineer on Facebook's database engineering team. "There are several applications powering this experience—and others across the site—that require global, real-time data fetching."

But storing and accessing Facebook levels of data, especially at the speeds Facebook requires, is no mean feat, Borthakur explains. One of the tools Facebook uses to face that challenge is RocksDB, an embeddable, persistent key-value store for fast storage that builds on Google's LevelDB open source key value database library.

Where RocksDB Shines

RocksDB is intended for applications that need low-latency database accesses. Borthakur points to a number of use cases, including the following:

  • User-facing applications that store the viewing history and state of uses of a website
  • Spam-detection applications that need fast access
  • Graph-search queries that need to scan data sets in realtime
  • Caching data from Hadoop, allowing apps to query Hadoop data in realtime
  • Message queues that support a high number of inserts and deletes

Borthakur says he expects the number of use cases to grow rapidly.

Reasons for Choosing an Embedded Database

Traditionally, applications access their data via remote procedure calls over a network connection. It works, but it's not fast, especially user-facing products that need to access the data in realtime. Enter the embedded database: Instead of accessing data over a network, many newer applications are avoiding that chokepoint by managing their own dataset on flash storage.

"There are several reasons for choosing an embedded database," Borthakur says. "When database requests are frequently served from memory or from very fast flash storage, network latency can slow the query response time. Accessing the network within a data center can take about 50 microseconds, as can fast-flash latency. This means that accessing data over a network could potentially be twice as slow as an application accessing data locally."

In addition, Borthakur says, servers are gaining an increasing number of cores and storage IOPS are reaching millions of requests per second.

"Lock contention and a high number of context switches in this software prevents it from being able to saturate the storage IOPS," he notes. "We're finding we need new database software that is flexible enough to be customized for many of these emerging hardware trends."

By fully using the IOPS offered by flash storage, Borthakur says RocksDB performs faster than LevelDB across random read, write and bulk uploads—10 times faster for a pure random write workload and a bulk upload and 30 percent faster for pure random read workloads.

The RocksDB code is now live.

Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for CIO.com. Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline, Facebook, Google + and LinkedIn.

Join the discussion
Be the first to comment on this article. Our Commenting Policies