Microsoft's Big Data Service Available After a Year in Preview
After a year of in-the-field testing, the Windows Azure HDInsight Service, which allows customers to spin up Hadoop clusters in the cloud, gets the green light for release into general availability.
Mon, October 28, 2013
CIO — NEW YORK--Microsoft today used the first day of O'Reilly Strata Conference + Hadoop World in New York City to announce that its Windows Azure HDInsight Service is now generally available after a year in preview.
The HDInsight Service, designed in partnership with Hadoop specialist Hortonworks, makes standard Apache Hadoop available as a service in Microsoft's Azure cloud, allowing you to deploy Hadoop clusters in minutes and shut them down just as easily.
Integration with the Microsoft data platform means that you can access and analyze your data with PowerPivot, Power View and other Microsoft BI tools, like Microsoft SQL Server Analysis Services (SSAS).
"Hadoop is a cornerstone of big data," says Quentin Clark, corporate vice president, Microsoft Data Platform. "The need for the insights and results and transformations from big data is really there. There are companies talking to us about how they don't feel they can even be competitive without embracing the big data phenomenon."
The goal, Clark says, is to bring Hadoop together with the flexibility of cloud deployment and the security that enterprises require to help customers achieve the competitive edge they need.
DNA Sequencing with HDInsight Service
The use cases are many and varied. For instance, Virginia Polytechnic Institute and State University has been using the HDInsight Service to aid its life sciences research in DNA sequencing.
Leveraging a grant from the National Science Foundation, Virginia Tech computer scientists developed an on-demand, cloud computing model using Windows Azure HDInsight Service that helps locate undetected genes in a massive genome database.
"Of the estimated 2,000 DNA sequences worldwide, they are generating 15 petabytes of genome data every year," says Wu Feng, professor of Computer Science at Virginia Tech. "Many life sciences institutions simply do not have access to the computational and storage resources required to work with data sets of this size. We're generating data faster than we can analyze it."
Fend and his team used the grant to develop two software artifacts: SeqInCloud, a popular genetic variant pipeline called the Genome Analysis Toolkit (GATK), and CloudFlow, a workflow management framework that uses both client and cloud resources.
SeqInCloud generalizes the GATK pipeline, allowing it to run in the cloud using HDInsight and Azure to maximize portability. Meanwhile, CloudFlow, installed on a researcher's PC, aids interactions with the Windows Azure HDInsight Service.