MapR's New Hadoop Distribution Promises No-Risk Upgrade
MapR's latest Hadoop distribution includes support for Hadoop 2.2 with YARN, but is also backward compatible with the MapReduce 1.x scheduler, promising organizations a risk-free upgrade path to the latest Hadoop architecture.
Mon, February 17, 2014
CIO — MapR Technologies, the company behind the MapR Distribution of Apache Hadoop, announced that it will make the latest version of its distribution, which includes support for Hadoop 2.2 with YARN, available in March.
The company has taken a different tack with its distribution than competitors Cloudera and Hortonworks—unlike it competitors, MapR has committed to backward compatibility, enabling organizations to run the Hadoop MapReduce 1.x and YARN schedulers on the same nodes in the cluster simultaneously.
By ensuring that MapReduce 1.x and YARN schedulers can coexist, MapR gives MapReduce 1.x users an easy and risk-free path to upgrade to the new scheduler, says Jack Norris, CMO of MapR Technologies.
"Our focus is really production use of Hadoop," Norris says. "Once you go into production, it's about availability and uptime and integration with existing apps. We're backward compatible from previous distributions to this distribution because you can't introduce changes easily into a production environment. Customers say, 'YARN's exciting, but I want to put my toe in the water. I've got existing jobs that are running.' We've got customers running over 20,000 jobs a day on our platform."
Apache Hadoop YARN (short for Yet Another Resource Negotiator) is the foundation of Hadoop 2.0, released last October. YARN serves as the Hadoop operating system, taking what was a single-use data platform for batch processing and turning it into a multi-use platform that enables batch, interactive, online and stream processing.
YARN acts as the primary resource manager and mediator of access to data stored in Hadoop distributed file system (HDFS), giving organizations the ability to store data in a single place and then interact with it in multiple ways, simultaneously, with consistent levels of service.
By combining YARN with MapR's read-write (R/W) POSIX data platform, Norris says MapR enables YARN-based applications to not only run on a Hadoop cluster and share compute resources, but also read, write and update data in the underlying distributed file system and database tables. As a result, it gives organizations the ability to develop and deploy a broader set of big data applications.
[Related: Hortonworks Brings Hadoop 2.0 to Windows]
"YARN opens up Hadoop for processing patterns beyond just MapReduce," says Evan Quinn, research director, Enterprise Management Associates. "MapR's Hadoop distribution extends YARN even further by adding a full, open standard NFS interface in addition to HDFS, enabling non-MapReduce applications to optimally take advantage of a cluster's storage."
"When we talk about a general-purpose storage platform, it's about random read-write," Norris says. "If you want to open up the processing to some other type of application, you don't want to have to rewrite that application just to take advantage of Hadoop. You just want it to run on the platform. Having to rewrite it to use the Hadoop distributed file system (HDFS) API introduces change that can require a lot of forethought and planning—and in some cases a redesign of the application. We allow you to run directly on the MapR platform with no changes, only now you're taking advantage of the highly distributed framework that MapR provides."