Does VMware Move Signal That Big Data Is Ready for Prime Time?
Deploying, configuring and maintaining Hadoop clusters is challenging and time-intensive, but VMware aims to change that with a new open source project that virtualizes the Hadoop cluster and makes it ready for the cloud.
Wed, June 13, 2012
CIO — By now, most CIOs are aware of Big Data and its promise. But it is an inescapable fact that creating, maintaining and configuring Hadoop clusters is challenging, costly and time-consuming. Doing so with high availability has been next to impossible. Now VMware hopes to change all that by virtualizing the Hadoop cluster and making it ready for the cloud.
"Hadoop is a Big Data processing de facto standard," says Fausto Ibarra, senior director of product management, Cloud Application Platform, at VMware. "One of the biggest challenges in the adoption of Hadoop is the difficulty in deploying Hadoop and the cost associated with that. What we're basically doing is dramatically simplifying what it takes to deploy, configure and manage Hadoop clusters."
Open Source Serengeti Virtualizes Hadoop
VMware today took the wraps off a new open source project dubbed Serengeti that is designed to be a "one-click" deployment toolkit for deploying highly available Hadoop clusters-and common Hadoop components like Apache Pig and Apache Hive-on VMware's vSphere platform. VMware is leading the Serengeti project in collaboration with key Hadoop distribution vendors like Cloudera, Greenplum, Hortonworks, IBM and MapR.
Currently, Hadoop is primarily deployed on a physical infrastructure. Such deployments can take days, weeks or even months depending on the scale, as IT obtains the necessary hardware, installs the distribution on the nodes and then configures the cluster and all the Hadoop components. And if the cluster is incorrectly sized for your need, resizing it can involve doing much of that work over again.
"With Serengeti you can deploy a Hadoop cluster in as little as 10 minutes without having to learn anything new," Ibarra says. "You have your choice of Hadoop distribution, and you will be able to reuse your existing virtual infrastructure running on vSphere; all while using the same skills and operations requirements as other things on vSphere."
"Hadoop must become friendly with the technologies and practices of enterprise IT if it is to become a first-class citizen within enterprise IT infrastructure," says Tony Baer, principal analyst at research firm OVUM. "The resource-intensive nature of large Big Data clusters make virtualization an important piece that Hadoop must accommodate. VMware's involvement with the Apache Hadoop project and its new Serengeti Apache project are critical moves that could provide enterprises the flexibility that they will need when it comes to prototyping and deploying Hadoop."
Making Hadoop Virtualization Aware
In addition to Serengeti, Ibarra says VMware is working with the Apache Hadoop community to contribute changes to the Hadoop Distributed File System (HDFS) and Hadoop MapReduce projects to make them "virtualization aware." These changes will allow data and compute jobs to be optimally distributed across a virtual infrastructure, giving enterprises the ability to achieve more elastic, secure and highly available Hadoop clusters.