MapR Technologies today announced the general availability of the MapR Converged Data Platform, which brings Hadoop together with Spark, Web-scale storage, NoSQL and streaming capabilities in a unified cluster, designed to support customers deploying real-time global data applications.
The Converged Data Platform features security, data governance and performance features enhancements built to meet enterprise requirements, and adds support for containers, including persistent storage and integrated resource management.
"The backdrop to all of this is there's a reason for convergence and it's not just convenience," says Jack Norris, senior vice president, Data and Applications, MapR Technologies. "It's eliminating latency and the separate clusters and silos used to process data."
The Converged Data Platform integrates the MapR Distribution including Apache Hadoop with the MapR File System (MapR-FS), the MapR-DB NoSQL database and its MapR Streams global event stream system.
Make 'high-frequency decisions'
Norris says that by bringing together Apache Hadoop, Apache Spark, a NoSQL database and continuous reliable streaming with global scale, the Converged Data Platform supports what MapR calls 'high-frequency decision.' In other words, you can take analytics, embed it into your operations and make adjustments on the fly so you can affect business while it's happening. For instance, it could help advertisers provide relevant real-time offers, healthcare providers improve personalized treatment, retailers optimize their inventory and telecom carriers dynamically adjust mobile service areas.
[ Related: MapR adds in-Hadoop document database ]
On the security front, the MapR Converged Data Platform delivers the following features:
- File and stream access control expressions (ACEs) that simplify the granting of permissions to users and groups across data files and directories using Boolean expressions, making security administration more scalable and manageable.
- Whole volume ACEs adds another level of protection for data files in MapR Volumes, and provides greater multi-tenancy controls to guarantee that data is only available to specific groups. Norris says this is especially useful in hosted customer-facing software-as-a-service (SaaS) applications to ensure that no client can access another customer's information.
- Selective auditing, which provides flexibility to track only the required activities to audit and/or analyze, giving flexibility in auditing while optimizing system performance.
Support for Boolean expressions in the ACEs provides you with granular controls. For instance, you could create complex lists that grant access to "managers and above" or "Marketing but not Jack."
"It's a powerful method," Norris says. "You're not limited to a group inclusion or a list."
Importantly, it controls access control at the data storage level, he says.
Layer on the containers
When it comes to supporting containers, the Converged Data Platform enables persistent storage and integrated resource management, allowing it to act as a comprehensive data services layer for Docker containers. This means it provides distributed, resilient storage for containers and includes the database and messaging/streaming capabilities that many containerized operational applications require.
[ Related: MapR adds self-service SQL Analytics ]
"Given the level of interest in Docker, it seems inevitable that enterprises will, at some point, want to run data-intensive workloads on the container technology," Matt Aslett, research director, data platforms and analytics, 451 Research, said in a statement Tuesday. "However, standard Docker containers include data volumes tied to an individual server, which means that if a container fails, or is moved from one server to another, its connection with the data volume is lost. As such, containers are not designed to be persistent — a key requirement for any data-intensive workload."
The Converged Data Platform is intended to resolve this issue with the new MapR POSIX Client, which presents a fully distributed, secure, reliable, read-write file system to Docker containers for resilient deployments on commodity hardware.
"We use Docker extensively in order to provide continuous product updates, helping our clients understand the behavior of their customers to make better business decisions," Simon Reid, group executive, Technology, at data analytics firm Quantium. "MapR gives us a resilient storage layer for Docker containers, ensuring we don't lose time or data when containers or servers crash."
Integration of Apache Myriad allows for infrastructure consolidation by sharing all data center resources between YARN and non-YARN jobs.
Peformance-wise, Norris says the new platform includes a number of enhancements that allow it to optimize for performance, including the way it makes use of SSDs.
"You can have different nodes that have different disk density, different compute profiles, different memory and different solid state drives," he says. "That allows you to make better utilization of the network interconnect and data flow, which leads to better performance."
Independent research firm ESG performed benchmark testing on MapR Streams and confirmed over 18,000,000 messages/second performance with over 3.5GB/sec throughput.
Norris notes that cloud-based deployments for the new MapR Platform will be available this month in major public clouds, including Amazon Web Services (AWS) Marketplace, Azure Marketplace and CenturyLink Marketplace.