Hadoop is an open source-based software framework that enables high throughput processing of big data quantities across distributed clusters.\nWhat started as niche market several years ago is now entering the mainstream. With the rapid expansion of the digital universe, Hadoop provides ample use cases allowing big data processing utilizing plain commodity hardware.\nIt\u2019s also highly scalable from a single server to multiple server farms with each cluster running its own compute and storage. Hadoop provides high availability at the application layer, hence cluster hardware can be off-the-shelf, making the nodes easily interchangeable and cost efficient.\nThe cloudification trend\nWhile early adopters typically used an on-premise deployment leveraging one of the several Apache distributions, organizations are increasingly taking advantage of the cloud. In contrast, a \u201cdo-it-yourself\u201d (DIY) approach can be tedious and time consuming.\nAs demand outweighs supply, skilled engineers with in-depth Hadoop experience are rare and expensive. Buying hardware is one thing, but building an analytics platform in a trial-and-error attempt can be lengthy and quite costly, too.\nAs time-to-market matters a great deal in the digital age, an increasing number of companies are taking advantage of Hadoop-as-a-Service (HaaS) offerings that are emerging quickly and enjoying high rates of adoption.\nUsing the cloud as the preferred destination can make a lot of sense from a user perspective. With lower costs per unit due to economies of scale, organizations gain efficiencies, avoid capital expenditures, and achieve much greater flexibility.\nBesides commercial benefits, the cloud most importantly just opens up a completely new array of digital use cases\u2014especially in the context of the IoT and other scenarios requiring real-time data processing. AWS\u2019 Elastic Map Reduce (EMR) was one of the pioneering offerings in this space.\nBut not only have basically all large service providers meanwhile added a cloud-based Hadoop hosting to their portfolio, but the distro vendors themselves are also making efforts to \u201ccloudify\u201d their frameworks, with Cloudera\u2019s Altus being one of the recent examples. Altus allows users to run data processing jobs leveraging either Hive on MapReduce or Spark on demand. Cloudera already publically announced their intention to extend their services toward other leading public clouds such as Microsoft Azure, with other vendors likely to follow.\nMarket developments\nIn strong pursuit toward the cloud, more and more organizations are opting for Hadoop-as-a-Service. HaaS is essentially a Platform-as-a-Service (PaaS) sub-category, comprising virtual storage and compute resources as well as Hadoop-based processing and analytics frameworks. Service providers typically operate a multi-tenant HaaS environment, allowing the hosting of multiple customers on a shared infrastructure.\nAs organizations are increasingly embracing a \u201ccloud-first\u201d mindset, the HaaS market is projected to garner $16.1 billion in revenues by 2020, registering a stellar compound annual growth rate (CAGR) of 70.8 percent from 2014 to 2020, as reported by Allied Market Research. North America is still the leading region from a revenue perspective, followed by Europe and Asia Pacific.\nThe outburst of HaaS is expected to overcast the growth of the on-premises Hadoop market through 2020. According to IDC\u2019s research, public cloud deployments already account for 12 percent of the overall worldwide business analytics software market and are expected to grow at a CAGR of 25 percent through 2020. Besides large corporates, small and medium-sized firms are also increasingly opting for HaaS to derive actionable insights and create data-centric business models.\nThings to contemplate when considering HaaS\nWhile there are undoubtedly plenty of use cases when leveraging HaaS, there are some drawbacks too. Moving shiploads of data into the cloud might have latency implications and require additional bandwidth. While a highly standardized HaaS environment can be conveniently deployed with just a few clicks, the design authority is at the sole discretion of the service provider. Moreover, data in the cloud unfolds gravity and leads to a lock-in effect. Here are some examples of what else to consider when evaluating a HaaS provider:\nElasticity\nHadoop supports elastic clusters for a wide range of workloads, which is even more important when considering a cloud-based deployment. What are the available compute and storage options to support different use cases? For example, what additional compute blades are available for high I\/O workloads? How scalable is the environment and how easily can additional resources (compute, storage) be commissioned?\nPersistent use of HDFS\nAlthough HDFS as a persistent data store isn\u2019t required, there are clear benefits when utilizing it. HDFS uses commodity direct attached storage (DAS) and shares the cost of the underlying infrastructure. Furthermore, HDFS seamlessly supports YARN and MapReduce, enabling it to natively process queries and serve as a data warehouse.\nBilling\nWhat\u2019s the underlying price metric of the service provider (billed as ordered, as consumed, etc.)? How flexible can services be decommissioned if capacity is underutilized for example? Most importantly, keeping in mind the fast expansions of the data lake, how do prices scale over time?\nHigh availability\nAchieving \u201czero outage\u201d is a delicate but very important matter. What\u2019s the provider\u2019s SLA and fail-over concept? How is redundancy being accomplished? For example, is the provider capable of isolating and restarting a single machine without interrupting an entire job (aka \u201cnon-stop operations\u201d)?\nInteroperability\nSince use cases tend to gain sophistication over time, how easy is it to integrate other services you might already be using or are planning to use? Which data streams and APIs are supported, and how well are they documented?\nNeed for talent\nWhile there is significantly less manpower needed when setting up a HaaS environment as opposed to a DIY approach, Hadoop doesn\u2019t work entirely out of the box. The nodes will be running with just a few clicks, but this is when the actual work begins. The customization will still require time and effort.