by Paul Rubens

Who cares about Hadoop on Linux? Microsoft (yes, really)

Oct 20, 2015

By offering big data services in its Azure cloud on Linux, Microsoft is sending several important messages to customers.

Last month Microsoft did something extraordinary – something which demonstrates how completely the company has changed since its third CEO, Satya Nadella, took over.

Microsoft announced the general availability of Azure HDInsight, a fully managed Apache Hadoop cluster service running on Linux in its Azure cloud.

At first glance that’s doesn’t seem like to much of a big deal. But it is, and here’s why.

When Microsoft announced Azure back in 2008, what it actually announced was “Windows Azure.” This was to be “Windows in the cloud” – a new platform as a service (PaaS) offering for developers to design Windows applications running in Microsoft data centers.

Today we don’t hear talk of Windows Azure: it’s very much the Microsoft Azure cloud, and there’s plenty besides Windows running within it – in fact, 20 percent of the virtual machines running in Azure are Linux-based, according to Microsoft.

What’s happened since Nadella took over is not that Microsoft has abandoned its PaaS plans – it’s just that Azure Infrastructure as a Service (IaaS) has become incredibly important to Microsoft.

That’s the view of Wes Miller, a former Microsoft program manager who is now an analyst at Directions on Microsoft. “Because of this, Microsoft is allowing – and in fact embracing – Linux, Docker and so on. The company’s attitude is: ‘If you want cloud, buy it from Microsoft. Azure is a cloud service and it can do Windows – but it can also do non-Microsoft platforms’.

This stance is echoed by T. K. “Ranga” Rengarajan, Microsoft’s corporate vice president, Data Platform. “We want Azure to be a place where all operating systems can run,” he says.

[Related: Why Power BI is the future of Excel]

“At launch we had a position which was not consistent with that,” he admits. “Now we are more relaxed, and open to partnerships too.”

Microsoft has been saying for some time that it is committed to open source, and it’s backed these words up with some significant actions to prove that it’s serious. But offering Azure HDInsight on Linux shows that Microsoft is serious about another message.

No more sacred cows

To understand what the message is, let’s go back a couple of years, when Microsoft was all for doing big data analysis using Hadoop. To that end the company promised to offer Hadoop in three ways:

  • A cloud-based Hadoop as a service offering called Windows Azure HDInsight Service.
  • The Hortonworks Data Platform for Windows – an open source Hadoop distribution running on Windows Server.
  • HDInsight Server for Windows – a Windows based Hadoop distribution designed to work in a virtualized, private cloud environment using Microsoft’s Hyper-V hypervisor and System Center management system

There were, in other words, plenty of ways that Microsoft would let you run Hadoop. The only catch was that they had to run on Windows.  At the time Wes Miller said: “I think part of the reason that Microsoft wants Hadoop on Windows is out concern about the competition Linux poses. The company also wants to ensure that if you do use Hadoop, you can also use SQL’s BI stack for the business intelligence part.”

But the new message that Microsoft is putting out, and illustrating with Azure HDInsight on Linux, is that every Microsoft offering has to stand on its own two feet: nothing is sacred and nothing is unthinkable. If a new product threatens an existing business line, then so be it.

We’ve seen before. For example, Microsoft Office was available on iOS and Android before it was available on the company’s own (struggling) mobile operating system. Now we are seeing this played out again in the cloud, in the context of Microsoft’s flagship server product.

“The reality at Microsoft now is that you can’t count on another division of the company to throw you a floatation device,” says Miller. “In this case, look at Windows Server. If it works great with HD Insight then fine, but the company is not going to lose out on being a customer’s cloud back end just because the customer doesn’t want to use Windows Server.”

That’s good for Azure because there are good reasons for customers to want to run HD Insight on an open source operating system rather than Windows, according to Rengarajan.

“Linux is actually where people are innovating: innovations appear first on Linux, and then these are ported to Windows. So there is customer demand for Linux,” he says. “This is part of a larger trend – not something specific to Hadoop. We realize that we can have a dramatic relevance to customers if we follow their needs,” he adds.

Enabling more hybrids

There are other reasons for customers wanting HD Insight on Linux too: the Linux-based ecosystem for big data tools is bigger, and many companies already run Hadoop on Linux in their own data centers – making it easier for them to create a hybrid cloud environment for their big data activities, if they can use Linux in Microsoft’s (or anyone else’s) public cloud as well.

Offering Azure HD Insight on Linux is also a good business move because there is ferocious competition between cloud providers – particularly between Amazon’s AWS, Azure and Google, Wes Miller says.

He points out that each of these three leviathans runs their clouds in different ways, so each has to compete for their customers based in their individual strengths. Restricting Azure to Windows would make it almost impossible for Microsoft to compete.

(For the record, Miller says that Azure’s main strength lies in the fact that Microsoft “gets” developers better than Amazon and to a lesser degree Google.)

The new reality also means that Microsoft won’t be dogmatic about building its Azure infrastructure on Windows components out of loyalty to the Windows part of the business. A perfect illustration of this is the recent unveiling of Microsoft’s Azure Cloud Switch (ACS) – a cross-platform modular operating system for data center networking which, interestingly, is built on Linux.

The idea that Microsoft would have built ACS out of Linux two years ago would have been ludicrous, but now it makes perfect sense.

“Nadella runs the business by empowering his people to make the right decisions for the company as a whole and not by worrying if another division might get hurt,” says Miller. “If I had to make a cloud switch, my reaction would also be to build it from Linux.”

But Microsoft’s Rengarajan is keen to point out that Microsoft is not abandoning Windows in the cloud for Big Data (or for anything else) – far from it.

To back this up he pulls says that HD Insight on Windows is currently one of Azure’s fastest growing services.