by Thor Olavsrud

How the 9 Leading Commercial Hadoop Distributions Stack Up

News
Mar 27, 20146 mins
Big DataOpen Source

All of the leading commercial Hadoop distributions are compatible with Apache Hadoop, so what sets them apart? Here's how the leading commercial distributions identified by Forrester Research compare.

Big data and Hadoop are in the process of transforming enterprise data management architectures. It’s a gold-rush market with pure-plays, enterprise software vendors and cloud vendors are all competing to stake a claim. The open source Apache Hadoop project includes the core modules — Hadoop Common, Hadoop Distributed File System (HDFS), Hadoop YARN and Hadoop MapReduce — but without the support or packaged solutions of a commercial vendor. All of the leading commercial distributions are compatible with Apache Hadoop, so what sets them apart? Here’s how the 9 leading commercial Hadoop distributions identified by Forrester Research stack up.

Amazon Web Services Elastic MapReduce Has the Most Market Share

Amazon Web Services Elastic MapReduce Has the Most Market Share

Amazon may not be the first thing that springs to mind when you think of Hadoop, but AWS’ Elastic MapReduce (EMR) was one of the first commercial Hadoop offerings on the market and leads in global market presence, says Forrester Principal Analyst Mike Gualtieri. EMR is Hadoop in the cloud leveraging Amazon EC2 for compute, Amazon S3 for storage and other services.

“AWS’ solution road map includes Amazon EMR integration with Amazon Kinesis for stream processing; stronger integration with Amazon Redshift data warehouse and other data sources; autoscaling that will resize clusters based on policies; support for additional NoSQL databases on top of Hadoop; and more BI integration with third-party vendors,” Gualtieri writes.

Cloudera Is Focused on Innovating on Hadoop Based on Enterprise Demands

Cloudera Is Focused on Innovating on Hadoop Based on Enterprise Demands

AWS may lead in market presence, but pure-play Cloudera is No. 2, with more than 200 paying customers, some of whom boast deployments of more than 1,000 nodes supporting more than a petabyte of data.

“Enterprise customers wanted a management and monitoring tool for Hadoop, so Cloudera built Cloudera Manager,” Gualtieri writes. “Enterprise customers wanted a faster SQL engine for Hadoop, so Cloudera built Impala using a massively parallel processing (MPP) architecture — the same architecture that EDWs use. Cloudera’s approach to innovation is to be loyal to core Hadoop but to innovate quickly and aggressively to meet demands and differentiate from those of other vendors.” Cloudera’s revenue model is primarily based on software subscriptions, though it also offers support.

Hortonworks Drives Open Source Hadoop Innovation

Hortonworks Drives Open Source Hadoop Innovation

Of all the players, pure-play Hortonworks hews closest to the Apache Hadoop open source community with Hortonworks Data Platform (HDP), but also aggressively pursues deep engineering partnerships with the likes of Microsoft, Teradata, SAP, Red Hat and others.

“Hortonworks’ strategy is to drive all innovation through the open source community and create an ecosystem of partners that accelerate Hadoop adoption among enterprises,” Gualtieri writes. “Where the open source community isn’t moving fast enough, Hortonworks will start new projects and commit Hortonworks resources to get them off the ground.”

Apache Ambari, which provides a Hadoop cluster management console, is a key example.

IBM InfoSphere BigInsights Has the Enterprise Reach of IBM Behind It

IBM InfoSphere BigInsights Has the Enterprise Reach of IBM Behind It

IBM doesn’t have the depth in the Hadoop community that some of its competitors boast, but it has deep roots in distributed computing and data management that allow it to offer a comprehensive Hadoop solution. It has more than 100 Hadoop deployments under its belt, some of which run to petabytes of data.

“In addition, IBM has advanced analytics tools, a global presence and implementation services, so it can offer a complete big data solution that will be attractive to many customers,” Gualtieri writes. “IBM’s road map includes continuing to integrate the BigInsights Hadoop solution with related IBM assets like SPSS advanced analytics, workload management for high-performance computing, BI tools and data management and modeling tools.”

MapR Technologies Offers Support for NFS and Other Innovations

MapR Technologies Offers Support for NFS and Other Innovations

MapR Technologies is the third pure-play on the list, but lacks the market presence of Cloudera and Hortonworks. Early on, it began focusing on enterprise features while most enterprises were still evaluating Hadoop in the proof of concept stage.

“MapR Technologies has added some unique innovations to its Hadoop distribution, including support for Network File System (NFS), running arbitrary code in the cluster, performance enhancements for HBase, as well as high-availability and disaster recovery features,” Gualitieri writes. Gualtieri notes that now that MapR’s competitors are firmly focused on building out enterprise features as well, the company needs to focus on making noise in the market and building out its partnerships and distribution channels.

Pivotal Software Leverages Its Greenplum Engineers

Pivotal Software Leverages Its Greenplum Engineers

Spun out of EMC and VMware, with former VMware CEO Paul Maritz at the helm, Pivotal Software has EMC technical consultants and data scientists behind it. In addition to the columnar Greenplum Database technology it brought from EMC, Pivotal’s Hadoop distribution has an MPP Hadoop SQL engine called HAWQ that provides MPP-like SQL performance on Hadoop.

“Pivotal was the first EDW vendor to provide a full-featured enterprise-grade Hadoop appliance; it was also the first to roll out an appliance family that integrated its Hadoop, EDW and data management layers in a single rack,” Gualtieri writes. “Pivotal’s road map will make its Hadoop solution significantly more competitive; its innovations focus on improving the HAWQ SQL engine and integration with other Pivotal products.”

Teradata Is Leveraging Its Expertise into Hadoop Appliance

Teradata Is Leveraging Its Expertise into Hadoop Appliance

Teradata is a specialist in enterprise data warehouse (EDW) appliances, and has built on that and a strong technical partnership with Hortonworks to offer Hadoop as an appliance.

“The Teradata distribution for Hadoop includes integration with Teradata’s management tool and SQL-H, a federated SQL engine that lets customers query data from its data warehouse and Hadoop,” Gualtieri writes. “It also has Aster for analytics against Hadoop.”

Teradata currently has fewer than 100 customers for its Hadoop appliance, but Gualtieri notes that its extensive financial, technical and management resources allow it to create a unique and high-performance appliance that will be difficult for other vendors to match.

Intel Delivers Hardware-Enhanced Performance, Security for Hadoop

Intel Delivers Hardware-Enhanced Performance, Security for Hadoop

Intel is a relative latecomer to the Hadoop distribution space, but is counting on the capabilities of its Intel Xeon chips to make it a contender.

“It is the first vendor to deliver hardware-enhanced performance and security capabilities for Hadoop,” Gualtieri writes. “Intel’s roadmap in the next year will bring it closer to and on par with other vendors in the Hadoop solutions market. In addition, Intel continues to focus on hardware-enhanced performance and security features, native task optimization, Lustre and graph analytics, which will differentiate its distribution to make it attractive to prospects.”

Microsoft Windows Azure HDInsight Has the Power of Cloud and Windows Behind It

Microsoft Windows Azure HDInsight Has the Power of Cloud and Windows Behind It

Designed as part of an engineering partnership with Hortonworks, Microsoft Windows Azure HDInsight Service is designed specifically for the Windows Azure cloud. HDInsight and Hadoop for Windows (a version of Hortonworks Data Platform) comprise the only Hadoop distributions that run in a Windows environment.

“Microsoft also offers Polybase to allow SQL Server customers to execute queries that also include data stored in Hadoop,” Gualtieri writes. “Microsoft has significant engineering efforts on other open-source community Hadoop projects, including the next generation of Hive. Microsoft’s significant presence in the database, data warehouse, cloud, OLAP, BI, spreadsheet (PowerPivot), collaboration and development tools markets offers an advantage when it comes to delivering a growing Hadoop stack to Microsoft customers.”