by Paul Rubens

What Open Source Hadoop Coming to Windows Means to IT

May 13, 20136 mins
Big DataData ManagementLinux

Hadoop is nearly synonymous with the analysis of big data. The Hortonworks Data Platform on Windows is significant as it means that companies lacking Linux expertise will finally be able to benefit from the big data analysis platform, which has been out of the reach of Windows shops.

Big data analysis has exploded onto the business stage over the last 12 months or so, and one of the most important Big Data analysis platforms is the open source Apache Hadoop project. It’s generally run on Linux, and it’s used by some big-name companies, including Yahoo!, Facebook and Twitter.

Hortonworks Data Platform (HDP) for Windows

Hortonworks Data Platform

What’s about to change over the next few months is that Hadoop is coming to Windows in the form of Hortonworks Data Platform (HDP) for Windows, a fully supported open source Hadoop distribution that runs on Windows Server. (Hortonworks, a California-based company, is a sponsor of and contributor to the Apache Hadoop project, and it already offers its Linux-based HDP distribution on a commercial basis.)

[Related: Hadoop Is Not Just for Linux Anymore]

This will open up Hadoop to a large number of organizations that have no in-house Linux skills. Shaun Connolly, vice president of Corporate Strategy at Hortonworks, explains the thinking behind moving HDP to Windows in this way: “Essentially it’s a market-driven decision,” he says. “Hadoop is built for the scaleout commodity hardware market, and the commodity hardware market is 70% Windows by install base and expertise.”

Employees in Windows-only companies will be able to make use of Hadoop easily because Excel can be used as a business intelligence tool to view the results of Hadoop Big Data analysis (whether Hadoop is running on Windows or Linux). “Ideally we want Microsoft users to be oblivious to the fact that everything is coming from Hadoop,” says Connolly. “If end users can consume data without any learning curve, thanks to tools like Excel, then they get more value.”

[Related: Microsoft Brings Big Data to Windows]

Windows shops will also be able to benefit from Hadoop on Windows because IT staff with Windows skills will be able to write Hadoop applications using Microsoft’s VisualStudio and .Net framework, without the need for any Linux expertise. (As an aside, both Hortonworks’ and Microsoft’s Windows offerings are 100% Apache Hadoop — there have been no tweaks to the code–so any Linux Hadoop app could easily be ported to Windows, Connolly says.)

But it turns out that HDP for Windows is not the only way that Hadoop is coming to Windows. Microsoft has been working behind the scenes with Hortonworks since late 2011, and the Redmond giant is about to release its own distribution of Hadoop which it calls HDInsight. This will be available as a service running in the company’s Azure cloud, or as a product that’s intended to be used as the basis of an on-premise private cloud Hadoop installation.

A decade or so ago Microsoft was resolutely anti open source software, and ironically it may be that its support for Hadoop stems from this old animosity, according to Wes Miller, an analyst at Directions on Microsoft. “I think part of the reason that Microsoft wants Hadoop on Windows is out of concern about the competition Linux poses,” he says.

[Slideshow: 10 Real-World Big Data Deployments That Will Change Our Lives]

But there’s another reason to, he says. “The company also wants to ensure that if you do use Hadoop, you can also use SQL’s BI stack for the business intelligence part.”

3 Ways Businesses Will Buy and Use Hadoop Capabilities on Windows

  1. Windows Azure HDInsight Service (in public preview): A cloud based Hadoop-as-a-Service offering available from Microsoft through Windows Azure.
  2. Hortonworks Data Platform (HDP) for Windows (currently in beta): An open source Hadoop distribution available from Hortonworks that runs directly on Windows Server, with support provided by the company on a subscription basis.
  3. Microsoft HDInsight Server for Windows (in public preview): A Windows based Hadoop distribution that is designed to work in a virtualized, private cloud environment, using Microsoft’s Hyper-V hypervisor and System Center management system. It will be supported by Microsoft, with third-line support provided by Hortonworks.

The quick and easy option is to use the Azure service, according to Eron Kelly, general manager of product marketing for Microsoft’s data platform. “This is an ideal way to consume Hadoop technology as it can be complicated to run using open source projects,” he says. “With Azure this can be done in a couple of clicks and you then pay for what you use.”

Companies do have other options for accessing Hadoop in the cloud — by running Hadoop in Amazon or Rackspace clouds, for example. But for companies that already use Azure, with large amounts of data (and logs) in that cloud, then the HDInsight service certainly would appear to make sense.

For companies that want to manage and run their own Hadoop installation, HDP for Windows is probably the option to go for — as long as they have their own Windows servers and are prepared to install the software and get it going. Support is available from Hortonworks. “This could be appealing to companies that have generally had a no open source software (on Linux) policy. The sort that may have wanted Hadoop, but only wanted it on Windows,” Wes Miller points out.

HDInsight Server for Windows will allow larger enterprises to take advantage of their existing investment in Microsoft’s software stack — particularly the cloud management capabilities of System Center — to incorporate it into a private cloud. And it needn’t be expensive — the product is available as a free download, Kelly explains.

“There’s no incremental fee for using HDInsight Server for Windows–we will monetize it by customers having to buy Windows Server to use it–and maybe from them using our data warehousing or BI environments as well,” he says. “We may also monetize it by selling newer versions of Excel,” he adds. (Microsoft offers an Excel 2013 and 2010 add-in called Data Explorer which can connect to HDInsight instances in Azure or on-premises.)

The big question mark over HDInsight Server for Windows is whether any but the largest companies will actually want to use it, Miller fears it may be too complicated. “Will smaller sized businesses really want to run Hadoop in a private cloud? Maybe if it is all automated, but I don’t think that Microsoft is going to be doing that,” he says.

There’s no doubt that Hadoop’s arrival on Windows is significant, because it will bring Big Data analysis within easy reach of the vast Windows market. (Hortonworks’ Shaun Connolly estimates that by releasing HDP for Windows, the company will double the potential market for HDP immediately.) And by lowering the barriers to entry for making use of Hadoop, smaller organizations and even business departments will be able to benefit from insights gained from the analysis of Big Data.

Paul Rubens is a technology journalist based in England. Contact him at everything from on Twitter @CIOonline, on Facebook, and on Google +.