With a focus on real-time applications, MapR Technologies took the wraps off the 5.0 version of its Hadoop distribution.
"Eighteen percent of our customers have 50 or more applications running on a single [Hadoop] cluster," says Jack Norris, chief marketing officer of MapR Technologies. "That means you need to support workload management and multi-tenancy."
The new MapR release auto-synchronizes storage, database and search indices to support complex, real-time applications, Norris says. It also includes comprehensive security auditing, Apache Drill support and the latest Hadoop 2.7 and YARN features.
"Designed as a large-scale batch data analysis system, Hadoop is not often associated with operational analytics or transaction processing," Carl W. Olofson, research vice president, data management software research, IDC, said in a statement today.
[Related: MapR adds self-service SQL Analytics ]
"Hadoop adds tremendous value for decision management at the strategic and operational levels, but still is emerging as a framework for making tactical decisions 'in the moment'. With Hadoop innovations — such as those in MapR 5.0 — happening every day, enterprises should consider using Hadoop as a 'Decision Data Platform' that functions as a single platform for handling both live operational data and real-time analytics," Olofson said.
[ Related: MapR offers free Hadoop training and certifications ]
New features in MapR 5.0 include the following:
- Extension of the MapR Real-time, Reliable Data Transport framework, which is used in the MapR-DB Table Replication capability to deliver and synchronize data in real time to external compute engines. The first external compute engine MapR is supporting is Elasticsearch. This support enables synchronized full-text search indexes automatically without writing custom code.
- Support for Hadoop 2.7 and YARN 2.7 to enable new features like YARN application rolling updates.
- Additional data governance and security, including comprehensive auditing for all data accesses via log files in JSON format. This enables reporting, validation and analysis with Apache Drill. The release also supports Drill 1.x, including Drill Views. This feature delivers secure access to field-level data in files.
The new version of MapR is expected to be available within 30 days.
To make it easier to deploy Hadoop clusters, MapR has also introduced new auto-provisioning templates that apply software-defined concepts to give organizations the ability to deploy a cluster with appliance-like convenience, without the need for specific hardware.
[Related: MapR aims to take SQL-on-Hadoop to next level ]
Users can deploy MapR Auto-Provisioning Templates via the MapR Installer, which provides auto-layout that optimizes the layout of selected services and hardware, rack awareness that automatically distributes critical services across failure domains and health checks that test servers to ensure they will perform optimally after installation.
Norris says the module will support the deployment of the following configurations:
- Data Lake: Common Hadoop Services. This configuration includes the most common services deployed in an Apache Hadoop cluster, including YARN, MapReduce, Spark and Hive.
- Data Exploration: Interactive SQL with Apache Drill. This configuration provides services needed for users to perform schema-free interactive exploration of their data.
- Operational Analytics: NoSQL Database with MapR-DB. This configuration deploys the MapR distributed NoSQL database, enabling both operational HBase applications to read and write at high rates, and analytic applications to perform in-situ data processing.