DataStax, a specialist in database software for cloud applications built on the open source foundations of Apache Cassandra and Apache TinkerPop, thinks the time for point solutions to data problems has passed.
Instead, the future of operational database management systems (DBMSs) is support for multiple data models, says Martin Van Ryswyk, executive vice president of Engineering at the Santa Clara, Calif.-based startup.
[ Related: Hadoop and Cassandra to Merge in DataStax Distro ]
"With new technology that is so amazing, people are willing to take a point product to solve a problem they couldn't solve before," he says. "But pretty quickly they want a suite. I think now what we're seeing is more of a simmering down. People want platforms that can cover a lot of the problem space."
Gartner analysts Nick Heudecker, Merv Adrian and Etisham Zaidi agree. In their Market Guide for NoSQL DBMSs, published in August last year, the trio wrote: "the future of DBMS architectures and deployments will be multimodels."
They added: "by 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single platform."
Today's modern cloud workloads involve numerous components that differ in their data model support requirements, Van Ryswyk explains.
The relational DBMS (RDBMS) ecosystem had a number of things going for it, he adds: it leveraged a vendor agnostic language for developers in the form of SQL, had well-defined separation between logical (developer) and physical (DBA) aspects of the DBMS and drivers by which applications could interact with databases.
On the other hand, RDBMSs don't play well with cloud applications, due to impediments including:
- A master-slave architecture mandating concessions on uptime and resiliency
- Scale and write-and-distribute-anywhere constraints for cloud application workloads
- Strict adherence to logical data layer constructs that value storage efficiency more than application agility
- A rigid data model that makes the use of semi- and unstructured data extremely cost prohibitive at scale
- The sharding architectural "Band-Aid" that increases operational expense exponentially
NoSQL technologies address those challenges, but two issues have created a highly fragmented offering for enterprise customers:
- Polyglot persistence means that customers must either use a limited set of one of the data models (key value, tabular, JSON, graph) or perform extract-transform-load (ETL) operations across data stores; a number of use cases, including master data management (MDM) and customer-360-view mandate ETL, which can increase the total cost of ownership (TCO)
- Each NoSQL vendor's mechanism for interacting with the data store is different, in both its dialect and where they lay on the logical/physical divide; this forces application developers to write abstraction layers if they need more than a single model in their application, and the abstraction layers have to work at different levels across the logical/physical spectrum to keep application development aligned
Multi-model databases are the next evolution of NoSQL technologies, Van Ryswyk says. They support multiple data models against a single, integrated backend. DataStax says a multi-model database platform should:
- Support more than one post relational data model (e.g., tabular, JSON, graph) at a logical layer for ease of development for application developers
- Ensure all models are exposed via cohesive mechanisms, avoiding cognitive context switching for developers
- Have a unified persistence layer that delivers geo-redundant, continuously available characteristics and a common framework for operational aspects such as security, provisioning, disaster recovery, etc.
- Empower a variety of use cases across OLTP and OLAP workloads
- Deliver best-in-case TCO efficiency for the long haul to enable wider adoption within centralized IT teams
DataStax Enterprise Graph
On Tuesday, DataStax added to its own multi-model capabilities with the announcement of DataStax Enterprise (DSE) Graph, a scaled-out graph database built for cloud applications that need to manage highly connected data.
Graph databases are a specialized form of NoSQL database intended to address relational data, but in a much more efficient and scale-out manner.
"Graph is an excellent method of evaluating, expressing and analyzing previously unrecognized relationships in data," Gartner's Heudecker and fellow analyst Mark Beyer wrote in their July 2015 report, Making Big Data Normal with Graph Analytics. "Instead of examining and analyzing data as a set of discrete and unrelated atomic elements, graph allows for the exploration of the frequency, strength and direction of relationships in data."
Last year, DataStax, a provider of database software for cloud applications, acquired Aurelius, the team behind open source graph database Titan. That team has spent the past year building a new set of software that extends beyond the basic capabilities of titan but maintains backward compatibility, allowing Titan and other users of TinkerPop supported graph databases to migrate easily.
Because it is built on DataStax's existing Cassandra foundation, Van Ryswyk says DSE Graph also inherits key Cassandra benefits, including constant uptime, read/write/active-everywhere functionality, linear scalability, predictable low-latency response times and operational maturity. DSE Graph also incorporates enterprise-class extensions found in DataStax Enterprise, including advanced security, built-in analytics, enterprise search, visual management monitoring and development tooling.
"We evaluated DataStax Enterprise Graph against traditional databases for some of our large banking customers and found that DSE Graph improves performance by an order of magnitude when working with data sets that include a large number of nodes and relationships — on use cases such as client data for financial services," Anil Gurnani, Banking and Capital Markets Solution at IT solutions provider Mphasis, said in a statement Tuesday.
DSE Graph includes:
- DataStax Enterprise Server. It delivers advanced graph database functionality that includes an adaptive query optimizer, automatic graph data partitioning, a distributed query execution engine and graph-specific index structures.
- DataStax OpsCenter. OpsCenter has been updated to provide full provisioning, management and monitoring for DSE Graph.
- DataStax Studio. This new, web-based solution helps developers visualize graphs and write/execute graph queries.
- DataStax Drivers. Drivers are available for all popular development languages and enhanced to support the Gremlin graph language in addition to CQL and DSE Analytics/Search APIs.
DSE Graph will be sold as an option to either a DSE Standard or DSE Max subscription. Van Ryswyk says it will be generally available in Q2.