Deploying, configuring and maintaining Hadoop clusters is challenging and time-intensive, but VMware aims to change that with a new open source project that virtualizes the Hadoop cluster and makes it ready for the cloud. By now, most CIOs are aware of Big Data and its promise. But it is an inescapable fact that creating, maintaining and configuring Hadoop clusters is challenging, costly and time-consuming. Doing so with high availability has been next to impossible. Now VMware hopes to change all that by virtualizing the Hadoop cluster and making it ready for the cloud. “Hadoop is a Big Data processing de facto standard,” says Fausto Ibarra, senior director of product management, Cloud Application Platform, at VMware. “One of the biggest challenges in the adoption of Hadoop is the difficulty in deploying Hadoop and the cost associated with that. What we’re basically doing is dramatically simplifying what it takes to deploy, configure and manage Hadoop clusters.” Open Source Serengeti Virtualizes HadoopVMware today took the wraps off a new open source project dubbed Serengeti that is designed to be a “one-click” deployment toolkit for deploying highly available Hadoop clusters-and common Hadoop components like Apache Pig and Apache Hive-on VMware’s vSphere platform. VMware is leading the Serengeti project in collaboration with key Hadoop distribution vendors like Cloudera, Greenplum, Hortonworks, IBM and MapR. SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe [Related: How to Use Hadoop to Overcome Storage Limitations] Currently, Hadoop is primarily deployed on a physical infrastructure. Such deployments can take days, weeks or even months depending on the scale, as IT obtains the necessary hardware, installs the distribution on the nodes and then configures the cluster and all the Hadoop components. And if the cluster is incorrectly sized for your need, resizing it can involve doing much of that work over again. “With Serengeti you can deploy a Hadoop cluster in as little as 10 minutes without having to learn anything new,” Ibarra says. “You have your choice of Hadoop distribution, and you will be able to reuse your existing virtual infrastructure running on vSphere; all while using the same skills and operations requirements as other things on vSphere.” “Hadoop must become friendly with the technologies and practices of enterprise IT if it is to become a first-class citizen within enterprise IT infrastructure,” says Tony Baer, principal analyst at research firm OVUM. “The resource-intensive nature of large Big Data clusters make virtualization an important piece that Hadoop must accommodate. VMware’s involvement with the Apache Hadoop project and its new Serengeti Apache project are critical moves that could provide enterprises the flexibility that they will need when it comes to prototyping and deploying Hadoop.” Making Hadoop Virtualization AwareIn addition to Serengeti, Ibarra says VMware is working with the Apache Hadoop community to contribute changes to the Hadoop Distributed File System (HDFS) and Hadoop MapReduce projects to make them “virtualization aware.” These changes will allow data and compute jobs to be optimally distributed across a virtual infrastructure, giving enterprises the ability to achieve more elastic, secure and highly available Hadoop clusters. VMware is also making changes to Spring for Apache Hadoop, the open source project it launched in February. Built on the Spring Java application framework, Spring for Hadoop is intended to make it easy for enterprise developers to build distributed processing solutions with Hadoop. Ibarra says the updates will give Spring developers the ability to build applications that integrate with the Hbase database, the Cascading library and Hadoop security. “Hadoop is now ready for prime time with these updates,” Ibarra says. “Provisioning a Hadoop cluster is going to be as simple as provisioning a new database or server.” Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for CIO.com. Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline and on Facebook. Email Thor at tolavsrud@cio.com Related content opinion The changing face of cybersecurity threats in 2023 Cybersecurity has always been a cat-and-mouse game, but the mice keep getting bigger and are becoming increasingly harder to hunt. By Dipti Parmar Sep 29, 2023 8 mins Cybercrime Security brandpost Should finance organizations bank on Generative AI? Finance and banking organizations are looking at generative AI to support employees and customers across a range of text and numerically-based use cases. By Jay Limbasiya, Global AI, Analytics, & Data Management Business Development, Unstructured Data Solutions, Dell Technologies Sep 29, 2023 5 mins Artificial Intelligence brandpost Embrace the Generative AI revolution: a guide to integrating Generative AI into your operations The CTO of SAP shares his experiences and learnings to provide actionable insights on navigating the GenAI revolution. By Juergen Mueller Sep 29, 2023 4 mins Artificial Intelligence feature 10 most in-demand generative AI skills Gen AI is booming, and companies are scrambling to fill skills gaps by hiring freelancers to make the most of the technology. These are the 10 most sought-after generative AI skills on the market right now. By Sarah K. White Sep 29, 2023 8 mins Hiring Generative AI IT Skills Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe