by Thor Olavsrud

Splice Machine boosts hybrid relational data platform

Nov 29, 2016
AnalyticsBig DataData Mining

The startup adds support for columnar storage, in-memory caching and cost-optimized AWS storage to its hybrid transactional and analytical processing platform.

data integration
Credit: Thinkstock

Building on its capability to concurrently run enterprise-scale transactional and analytical workloads, Splice Machine today announced the release of version 2.5 of its platform at AWS re:Invent 2016.

The startup’s platform is a dual-engine relational database management system (RDBMS) powered by Apache Hadoop and Apache Spark that specializes in hybrid transactional and analytical processing (HTAP). Splice Machine uses resource isolation — separate processes and resource management for its Hadoop and Spark components — to ensure that large, complex online analytical processing (OLAP) queries don’t overwhelm time-sensitive online transaction processing (OLTP) queries.

The hybrid architecture allows you to run analytical workloads and transactional workloads concurrently — a boon for use cases ranging from digital marketing to ETL acceleration, operational data lakes, data warehouse offloads, Internet of Things (IoT) applications, web, mobile and social applications and operational applications.

The latest release adds support for columnar storage, in-memory caching and cost-optimized storage for AWS users, among other features. At AWS re:Invent, Splice Machine demonstrated how users can leverage the new capabilities on AWS to integrate multiple compute and storage engines into an elastically scalable database that can be a relational database and data warehouse in one.

“The new capabilities further emphasize the benefits of Splice Machine’s hybrid architecture,” Monte Zweben, co-founder and CEO of Splice Machine, said in a statement today. “For modern applications that need to combine fast data ingestion, web-scale transactional and analytical workloads and continuous machine learning, one storage model does not fit all. The Splice Machine SQL RDBMS tightly integrates multiple compute engines, with in-memory and persistent storage in both row-based and columnar formats. The cost-based optimizer uses new advanced statistics to find the optimal execution strategy across all these resources for OLTP and OLAP workloads.”

The new capabilities of version 2.5 of the Splice Machine platform include the following:

  • Columnar External Tables. Columnar External Tables enables hybrid columnar and row-based querying. Columnar external tables can be created in Apache Parquet, Apache ORC or text formats. Columnar Storage improves large table scans, large joins, aggregations or groupings while the native row-based storage is used for write-optimized ingestion, single-record lookups/updates and short scans.
  • In-Memory Caching via Pinning. This feature provides the ability to move tables and columnar data files into memory for lightning-fast data access. It avoids multiple table scans or writes to high-latency file systems such as Amazon S3. Splice Machine says the capability allows data to be stored on very inexpensive storage while being very performant in-memory when required in applications.
  • Statistics via Sketching. This feature helps solve the age-old problem that cost-based optimizers are only as good as their statistics, but most statistics are poor because statistics computation is expensive. Splice Machine utilizes the sketching library created by Yahoo! to provide very fast approximate analysis of big data statistics with bounded errors. Using sketches and histograms, Splice Machine says the cost-based optimizer can choose indexes, join orders and join algorithms with much more accuracy.
  • Cost-Optimized Storage for AWS users. Data can be stored locally in ephemeral storage, on EBS, S3 and EFS. Depending on the workload and longevity of data, different data can be stored in different storage systems with different price/performance characteristics.