Splice Machine yesterday announced the 2.0 version of its relational database management system (RDBMS), which aims to give you the scalability of Hadoop and the performance of Spark without the need to rewrite years' worth of SQL or retrain staff.
"It is a real breakthrough, I think, in database technology," says Monte Zweben, co-founder and CEO of Splice Machine. "Before now, it was very difficult to have a mixed workload on a single database. What companies had to do is do their real-time, concurrent transactional work on one platform and then convert all that data onto another platform through ETL to analyze it and derive insight."
That architecture has created a huge decision lag, Zweben says, forcing companies to make decisions on yesterday's data.
How it works
The new RDBMS uses resource isolation — separate processes and resource management for its Hadoop and Spark components — in an effort to ensure that large, complex online analytical processing (OLAP) queries don't overwhelm time-sensitive online transaction processing (OLTP) queries.
[ Related: Hadoop powers big data digital marketing platform ]
By setting custom priority levels for OLAP queries, users can ensure important reports aren't blocked by massive batch processes that consume all of a cluster's available resources. The new version also adds an extensive management console through which users can monitor queries in process and visualize each step of the execution pipeline, including the ability to see import errors in batch import processes in real time.
"The analysis won't impact or interfere with the transactions," Zweben says. "By having a hybrid architecture like this, you get simultaneous workloads, which allows companies to make decisions in the moment."
That, he says, makes it ideal for use cases ranging from digital marketing to ETL acceleration, operational data lakes, data warehouse offloads, Internet of Things (IoT) applications, web, mobile and social applications and operational applications.
"By delivering an affordable, fully operational platform that is designed to support OLTP and OLAP workloads concurrently, Splice Machine 2.0 offers a unique and powerful way for businesses to perform real-time analytics and operational queries together without sacrificing performance or breaking the bank," Charles Zedlewski, vice president, Products at Cloudera, said in a statement Tuesday. "As more customers start to run Spark on Cloudera's platform, Splice Machine's integration complements the analytical capabilities of our enterprise data hubs, enabling customers across a variety of industries to handle all types of workloads with greater efficiency."
Handling unstructured data in a SQL database
Splice Machine 2.0's architecture also includes the capability to execute federated queries on data in external databases and files using Virtual Table Interfaces (VTIs). It can also execute all pre-built Spark libraries for machine learning, stream analysis, data integration and graph modeling.
[ Related: 5 questions for a top machine learning expert ]
That means that even though it's a relational database, it can handle unstructured data using VTIs.
"We can actually apply queries against external data types that might be unstructured," Zweben says. "We also have an interface that is the standard Hadoop interface that allows all of the unstructured capabilities in Hadoop and Spark to call our database and get transactionally mature data from our database and then put data back after it's been processed in a transactionally consistent way."
As a result, he says, organizations will now be able to get the scale out benefits of NoSQL databases without throwing the baby out with the bathwater.
"There's billions of lines of code written in SQL," he says. "We don't think companies should have to rewrite all their code. Also, SQL is way more powerful than NoSQL. It has 30 years of development behind it that enables developers to build enterprise applications. From our standpoint and our customers' standpoints, they don't want to rewrite all their code and spend millions of dollars and retrain all their people."
The company is now accepting applications to test the public beta of Splice Machine 2.0. Zweben expects the public beta to last several months, followed by general availability in the first half of 2016.
For beta testers, Zweben says Splice Machine is looking for organizations with mixed workloads — particularly ones that have real-time, concurrent requirements to keep data updated and fresh along with the need for multiple users to access the data at the same time. Applicants for the beta should also need to do frequent analysis on the data, whether that's regular reports or ad hoc analysis.
Splice Machine is particularly looking for use cases in digital marketing applications, financial services applications and life science applications.
Wells Fargo has already signed on to put Splice Machine 2.0 to the test.
"The financial services industry has seen exponential increases in volume and variety of data that shows no signs of relenting, causing us to look for new architectures that can simultaneously support both operational and analytical workloads," Jesse Lund, head of R&D for Wells Fargo, said in a statement yesterday. "We are impressed with the Splice Machine 2.0 hybrid architecture and are excited to put it to the test."