by Mike Lamble

Bringing enterprise analytics to the cloud

Opinion
Aug 12, 2015
AnalyticsBig DataCloud Computing

Cloud solution partners are creating new methods to long time problems such as parallelizing statistical processing and automating common data integration tasks. As we move into the era of self-service analytics, a number of vendors envision a future of self-service data integration. Are you ready for the ride?

cloud computing thinkstock
Credit: Thinkstock

We’re coming to the tipping point for public cloud adoption, and it’s going to have big consequences for data warehousing, BI, and analytics. It’s no longer a question of if we’ll move analytics to the cloud, but rather of when.  There’s also a big question of what enterprise class data integration will look like in a hybrid cloud.

In terms of the Innovation Curve, we’ve moved from “early adopters” to the “early majority.”  According to a recent Gartner survey, this year saw a 50 percent jump in the portion of respondents who said they plan to run mission critical applications on the cloud, from about 30 percent in each of the previous four years to 45 percent this year.  Many companies – even Fortune 1000 – are mandating that all new infrastructure will be in the cloud.

Cloud benefits

Here’s a quick recap of cloud benefits. First, speed to market of solutions: cloud infrastructure can be provisioned in days rather than months and provides unlimited scalability to speed up jobs or handle volume increases. Gone are the multi-month projects that involve adding servers, disks, cages, racks, cores, routers, switches, cables, etc.  Also gone are the time and hassle of adopting new best-of-breed infrastructure technology. Second; renting can cost a lot less owning: with public cloud solutions you pay only for the resources you need when you need them. Price wars between cloud service partners (CSPs) are bringing down prices, and competition for enterprise business combined with economies of scale will continue for years. Third; data and analytics innovators are targeting products for cloud. Lastly, CSPs provide uniform coverage globally and across time zones. The most common objection to cloud adoption is data security, but the evolving conventional wisdom is that cloud data is already safer than many, if not most, data centers.

Starter projects for enterprise analytics in the cloud

CSP analytics tool stack is reasonably complete and growing. In fact, major CSPs support big data tools Hadoop and Map Reduce, SQL database management systems, and a variety of data visualization and dashboarding tools. Additionally, cloud solution partners are creating new methods to long time problems such as parallelizing statistical processing and automating common data integration tasks (e.g., adapters for SAP).

For companies with an on premise Enterprise Data Warehouse (EDW) looking to start taking advantage of the cloud, here are a few project profiles that could be a good way to get started. 

  • One-time big data projects. Gigabytes to petabytes can be provisioned quickly and brought down when the project is complete whether it’s weeks, months, or indefinitely until the job is done.
  • Anything where you might be considering a massively parallel processing (MPP) appliance. Projects that require terabytes and up – and target tens to hundreds of users rather than thousands – could be a great place to start. Sophisticated appliance IT managers will be delighted by the cloud’s flexibility. You won’t need to pre-pay for capacity, and adding resources is done on a config screen versus buying a new rack.
  • Machine learning projects. Hadoop’s architecture suits this class of solution, and leading CPSs offer robust Hadoop distributions and machine learning analytic software.
  • Departmental dashboard projects. These are ideal for getting acquainted with a CSP’s solution stack and development nuances. In fact, many of these are already up and running.
  • IoT Data Lakes. These include both structured and semi-structured data, with volumes that can be enormous. Data lake projects are often constrained by Hadoop clusters being out of the box for on premise data centers. This isn’t an obstacle for CSPs, whose offerings decouple storage and compute resources in order to economize on storage of less often used data.

Analytic data integration on the hybrid cloud

A modern enterprise most likely includes on premise data centers, public cloud, and core SaaS applications to run the business including Salesforce, Workday, Marketo, LinkedIn, Google, IoT, Concur, and dozens more  are yet  to come. This is the hybrid cloud. The EDW is the de facto solution for analytic data integration (versus ESB/EAI for operations applications). So what does the EDW look like in the hybrid cloud environment?

As we move into the era of self service analytics, a number of vendors envision a future of self-service data integration. For example, here’s a quote from SnapLogic’s web site: “Today’s ‘citizen integrators’ range from members of your enterprise IT organization, to people in sales operations, marketing, finance and HR, to analysts and administrators of SaaS applications …  These people are increasingly finding themselves in need of a fast, multi-point and modern cloud integration platform.”

But what does this “multi-point and modern cloud integration” platform look like?  It may include a number of things but at a minimum it better include data management practices that create a single version of the truth, minimize re-work, document meta data, trace changes, ensure recoverability, and consolidate and certify heady data integration tasks; i.e., the unsung achievements of the EDW. It’s going to be an interesting journey and we look forward to the ride.

We’re coming to the tipping point for public cloud adoption, and it’s going to have big consequences for data warehousing, BI, and analytics. It’s no longer a question of if we’ll move analytics to the cloud, but rather of when.  There’s also a big question of what enterprise class data integration will look like in a hybrid cloud.

In terms of the Innovation Curve, we’ve moved from “early adopters” to the “early majority”.  According to a recent Gartner survey, this year saw a 50% jump in the portion of respondents who said they plan to run mission critical applications on the cloud, from about 30 percent in each of the previous four years to 45 percent this year.  Many companies – even Fortune 1000 – are mandating that all new infrastructure will be in the cloud.