Just ahead of the opening of Strata + Hadoop World in New York City tomorrow, Cloudera today unveiled a new open source project to enable real-time analytic applications in Hadoop and an open source security layer for unified access control enforcement in Hadoop.
The first project, Kudu, is an in-memory store for Hadoop that supports high-performance sequential and random reads and writes, enabling fast analytics on changing data.
The idea behind Kudu is to stop forcing developers to choose between fast analytics with HDFS or updating data with HBase. Attempts to combine the two have led to complex architectures. Cloudera says Kudu, an updateable, columnar store for Hadoop, eliminates the need for such complex architectures when it comes to use cases like time series analysis, machine data analytics and online reporting.
[ Related: How 7 companies bring power to Hadoop big data applications ]
Cloudera jointly engineered Kudu with Intel to better leverage in-memory hardware and Intel’s 3D XPoint technology. The project has also drawn support from organizations like Xiaomi, AtScale, Splice Machine and Zoomdata.
“Our infrastructure team has been working with Cloudera to develop Kudu, taking advantage of its unique ability to support columnar scans and fast inserts and updates to continue to expand our Hadoop ecosystem footprint,” says Baoqiu Cui, chief architect smartphone developer Xiaomi. “Using Kudu, alongside interactive SQL tools like Impala, has allowed us to build a next-generation data analytics platform for real-time analytics and online reporting.”
A beta of Kudu is immediately available under the Apache open source license, and Cloudera says the project will be transitioned to the Apache Software Foundation in the near future.
[ Related: Hadoop powers big data digital marketing platform ]
Meanwhile, RecordService, also available on Monday, is a new core security for Hadoop that provides unified access control enforcement for Hadoop.
Each Hadoop access engine currently applies policies differently — some have more granular restrictions than others. Apache Sentry is an Apache project that provides unified role-based policy management in Hadoop. RecordService builds on Apache Sentry. It is a new layer that sits between Hadoop’s storage and compute engines to consistently enforce the role-based access controls defined by Sentry. RecordService also provides dynamic data masking across Hadoop, protecting sensitive data as it is accessed.
“Security is a critical part of Hadoop and has seen rapid improvements made, especially as companies need to store, process and analyze sensitive data using a wide array of tools including Apache Spark and Impala,” Mike Olson, co-founder and chief strategy officer at Cloudera, said in a statement today. “However, for Hadoop to continue to evolve and support the next generation of analytics for ever-growing amounts of users and access paths, security needs to become universal across the platform. With RecordService, the Hadoop community fulfills the vision of unified fine-grained access controls for every Hadoop access path — supporting the continued development and usage of the leading innovative tools with the confidence that a power, core security layer is protecting their most sensitive data.”
RecordService is now available in beta under the Apache license and Cloudera plans to donate the project to the Apache Software Foundation in the future.
Follow Thor on Google+