All of the leading commercial Hadoop distributions are compatible with Apache Hadoop, so what sets them apart? Here's how the leading commercial distributions identified by Forrester Research compare. Big data and Hadoop are in the process of transforming enterprise data management architectures. It’s a gold-rush market with pure-plays, enterprise software vendors and cloud vendors are all competing to stake a claim. The open source Apache Hadoop project includes the core modules — Hadoop Common, Hadoop Distributed File System (HDFS), Hadoop YARN and Hadoop MapReduce — but without the support or packaged solutions of a commercial vendor. All of the leading commercial distributions are compatible with Apache Hadoop, so what sets them apart? Here’s how the 9 leading commercial Hadoop distributions identified by Forrester Research stack up.Amazon Web Services Elastic MapReduce Has the Most Market ShareAmazon may not be the first thing that springs to mind when you think of Hadoop, but AWS’ Elastic MapReduce (EMR) was one of the first commercial Hadoop offerings on the market and leads in global market presence, says Forrester Principal Analyst Mike Gualtieri. EMR is Hadoop in the cloud leveraging Amazon EC2 for compute, Amazon S3 for storage and other services.“AWS’ solution road map includes Amazon EMR integration with Amazon Kinesis for stream processing; stronger integration with Amazon Redshift data warehouse and other data sources; autoscaling that will resize clusters based on policies; support for additional NoSQL databases on top of Hadoop; and more BI integration with third-party vendors,” Gualtieri writes.Cloudera Is Focused on Innovating on Hadoop Based on Enterprise DemandsAWS may lead in market presence, but pure-play Cloudera is No. 2, with more than 200 paying customers, some of whom boast deployments of more than 1,000 nodes supporting more than a petabyte of data.“Enterprise customers wanted a management and monitoring tool for Hadoop, so Cloudera built Cloudera Manager,” Gualtieri writes. “Enterprise customers wanted a faster SQL engine for Hadoop, so Cloudera built Impala using a massively parallel processing (MPP) architecture — the same architecture that EDWs use. Cloudera’s approach to innovation is to be loyal to core Hadoop but to innovate quickly and aggressively to meet demands and differentiate from those of other vendors.” Cloudera’s revenue model is primarily based on software subscriptions, though it also offers support.Hortonworks Drives Open Source Hadoop InnovationOf all the players, pure-play Hortonworks hews closest to the Apache Hadoop open source community with Hortonworks Data Platform (HDP), but also aggressively pursues deep engineering partnerships with the likes of Microsoft, Teradata, SAP, Red Hat and others.“Hortonworks’ strategy is to drive all innovation through the open source community and create an ecosystem of partners that accelerate Hadoop adoption among enterprises,” Gualtieri writes. “Where the open source community isn’t moving fast enough, Hortonworks will start new projects and commit Hortonworks resources to get them off the ground.”Apache Ambari, which provides a Hadoop cluster management console, is a key example.IBM InfoSphere BigInsights Has the Enterprise Reach of IBM Behind ItIBM doesn’t have the depth in the Hadoop community that some of its competitors boast, but it has deep roots in distributed computing and data management that allow it to offer a comprehensive Hadoop solution. It has more than 100 Hadoop deployments under its belt, some of which run to petabytes of data.“In addition, IBM has advanced analytics tools, a global presence and implementation services, so it can offer a complete big data solution that will be attractive to many customers,” Gualtieri writes. “IBM’s road map includes continuing to integrate the BigInsights Hadoop solution with related IBM assets like SPSS advanced analytics, workload management for high-performance computing, BI tools and data management and modeling tools.”MapR Technologies Offers Support for NFS and Other InnovationsMapR Technologies is the third pure-play on the list, but lacks the market presence of Cloudera and Hortonworks. Early on, it began focusing on enterprise features while most enterprises were still evaluating Hadoop in the proof of concept stage.“MapR Technologies has added some unique innovations to its Hadoop distribution, including support for Network File System (NFS), running arbitrary code in the cluster, performance enhancements for HBase, as well as high-availability and disaster recovery features,” Gualitieri writes. Gualtieri notes that now that MapR’s competitors are firmly focused on building out enterprise features as well, the company needs to focus on making noise in the market and building out its partnerships and distribution channels.Pivotal Software Leverages Its Greenplum EngineersSpun out of EMC and VMware, with former VMware CEO Paul Maritz at the helm, Pivotal Software has EMC technical consultants and data scientists behind it. In addition to the columnar Greenplum Database technology it brought from EMC, Pivotal’s Hadoop distribution has an MPP Hadoop SQL engine called HAWQ that provides MPP-like SQL performance on Hadoop.“Pivotal was the first EDW vendor to provide a full-featured enterprise-grade Hadoop appliance; it was also the first to roll out an appliance family that integrated its Hadoop, EDW and data management layers in a single rack,” Gualtieri writes. “Pivotal’s road map will make its Hadoop solution significantly more competitive; its innovations focus on improving the HAWQ SQL engine and integration with other Pivotal products.”Teradata Is Leveraging Its Expertise into Hadoop ApplianceTeradata is a specialist in enterprise data warehouse (EDW) appliances, and has built on that and a strong technical partnership with Hortonworks to offer Hadoop as an appliance.“The Teradata distribution for Hadoop includes integration with Teradata’s management tool and SQL-H, a federated SQL engine that lets customers query data from its data warehouse and Hadoop,” Gualtieri writes. “It also has Aster for analytics against Hadoop.”Teradata currently has fewer than 100 customers for its Hadoop appliance, but Gualtieri notes that its extensive financial, technical and management resources allow it to create a unique and high-performance appliance that will be difficult for other vendors to match.Intel Delivers Hardware-Enhanced Performance, Security for HadoopIntel is a relative latecomer to the Hadoop distribution space, but is counting on the capabilities of its Intel Xeon chips to make it a contender.“It is the first vendor to deliver hardware-enhanced performance and security capabilities for Hadoop,” Gualtieri writes. “Intel’s roadmap in the next year will bring it closer to and on par with other vendors in the Hadoop solutions market. In addition, Intel continues to focus on hardware-enhanced performance and security features, native task optimization, Lustre and graph analytics, which will differentiate its distribution to make it attractive to prospects.”Microsoft Windows Azure HDInsight Has the Power of Cloud and Windows Behind ItDesigned as part of an engineering partnership with Hortonworks, Microsoft Windows Azure HDInsight Service is designed specifically for the Windows Azure cloud. HDInsight and Hadoop for Windows (a version of Hortonworks Data Platform) comprise the only Hadoop distributions that run in a Windows environment.“Microsoft also offers Polybase to allow SQL Server customers to execute queries that also include data stored in Hadoop,” Gualtieri writes. “Microsoft has significant engineering efforts on other open-source community Hadoop projects, including the next generation of Hive. Microsoft’s significant presence in the database, data warehouse, cloud, OLAP, BI, spreadsheet (PowerPivot), collaboration and development tools markets offers an advantage when it comes to delivering a growing Hadoop stack to Microsoft customers.” Related content brandpost Sponsored by Freshworks When your AI chatbots mess up AI ‘hallucinations’ present significant business risks, but new types of guardrails can keep them from doing serious damage By Paul Gillin Dec 08, 2023 4 mins Generative AI brandpost Sponsored by Dell New research: How IT leaders drive business benefits by accelerating device refresh strategies Security leaders have particular concerns that older devices are more vulnerable to increasingly sophisticated cyber attacks. By Laura McEwan Dec 08, 2023 3 mins Infrastructure Management case study Toyota transforms IT service desk with gen AI To help promote insourcing and quality control, Toyota Motor North America is leveraging generative AI for HR and IT service desk requests. By Thor Olavsrud Dec 08, 2023 7 mins Employee Experience Generative AI ICT Partners feature CSM certification: Costs, requirements, and all you need to know The Certified ScrumMaster (CSM) certification sets the standard for establishing Scrum theory, developing practical applications and rules, and leading teams and stakeholders through the development process. By Moira Alexander Dec 08, 2023 8 mins Certifications IT Skills Project Management Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe