Hortonworks Makes Hadoop More Versatile in New Distro
Built on Apache Hadoop YARN architecture, HDP 2.0 changes Hadoop from a single-purpose Web-scale batch data processing platform into multi-use operating system for batch, interactive, online and stream processing.
Wed, October 23, 2013
CIO — As enterprises begin deploying Apache Hadoop to store their data and enable their users to interact with it in various ways, they've often run into a glaring problem: Hadoop was designed for the singular purpose of Web-scale data processing. Enterprises of all sorts increasingly want to store all incoming data in Hadoop—creating a sort of data lake—which their users can then leverage for uses ranging from batch processing to analyzing data streams as they arrive.
Case in point: running SQL on Hadoop. Business analysts have been using SQL as the query language to perform ad-hoc queries against data warehouses for years. If you're creating a data lake using Hadoop, you've got to be able to query that data using SQL.
"But by building SQL access on top of Hadoop, it just highlights the challenge of Hadoop being a single application system," writes Arun Murthy, founder and architect at Hortonworks and former architect of the Yahoo Hadoop Map-Reduce Development Team. "For when I run a SQL query on that data, it could consume all the resources of the cluster and cause performance issues for the other applications and jobs running in the cluster—not a good outcome to say the least."
The answer to that problem is YARN (Yet Another Resource Negotiator), the foundation of the recently released Hadoop 2. Apache Hadoop YARN serves as the Hadoop operating system, taking what was a single-use data platform for batch processing and evolving it into a multi-use platform that enables batch, interactive, online and stream processing.
YARN acts as the primary resource manager and mediator of access to data stored in Hadoop distributed file system (HDFS), giving enterprises the capability to store data in a single place and then interact with it in multiple ways, simultaneously, with consistent levels of service.
Hortonworks, provider of the Hortonworks Data Platform (HDP), one of the most popular distributions of Hadoop, was quick to take up the YARN banner today with the announcement of the general availability of HDP 2.0.
HDP 2.0 is the first commercial distribution built on Hadoop 2, delivering the YARN-based architecture and new features from Phase 2 of the Stinger Initiative. The Stinger Initiative is a community-based effort that aims to enhance the speed, scale and breadth of SQL semantics supported by Apache Hive.
"The YARN-based architecture of HDP 2.0 delivers on our mission to enable the modern data architecture by providing one enterprise Hadoop that deploy integrates with existing, and future, data center technologies, says Shaun Connolly, vice president of corporate strategy at Hortonworks.