Mike Feibus
Contributor

The rise of the cloud data platform

Opinion
May 25, 20217 mins
Cloud ComputingData ArchitectureData Management

As impressive as the enterprise’s progress toward a data-driven economy was last year, the big bang we’ve been expecting from big data is still ahead.

analytics binary code network digital transformation
Credit: Getty Images

The year 2020 is seared into CIOs’ collective memory as one of the most cataclysmic, consequential years this century. But while the pandemic helped drive digital transformation far beyond anyone’s expectations, make no mistake: This was not the perfect storm we’ve all envisioned.

If anything, the unprecedented wave of data-driven deployments last year shined a light on shortcomings in existing frameworks that are muting big data’s potential. Many organizations, for example, have been frustrated by how long it can take just to prepare data before analysts and scientists can even start a new project.

As well, the process for migrating the resulting models into business processes has been just as vexing. And without the proper security and governance controls across the entire data lifecycle, analysts and data scientists run the risk of exposing and permanently sullying critical data stores.

“It’s really about expanding the data management purview,” said Richard White, Chief Data Officer at New York Life Insurance Company. “What we’re seeing is a triangulation of the data, data protection and cloud strategies. And they’re all coming together, with more pervasive governance.”

The need for that triangulation isn’t new. But the requirements became vastly more complicated during the pandemic, primarily for two reasons. First, the number of data science projects inside many organizations mushroomed. And second, the constellation of data sources fueling those projects greatly expanded.

Which helps explain why more than 37 percent of organizations view security and governance as their top cloud challenge, according to the just-released Global Cloud Survey 2021 from data virtualization provider Denodo.

But perhaps the greatest data management lesson to come out of the pandemic is that many more organizations now understand that there’s a right way to do data, and a wrong way. And while there are tools to make the job easier, there are no shortcuts.

“There’s legitimate, difficult work to do to bring data together, and then to do science on that data,” Chris Wright, Senior Vice President and CTO of Red Hat, told me. “So you need to have the right tool set in place.”

Indeed. Which is why many CIOs and CDOs are making it a priority to build, rent or buy what some are calling a cloud data platform – that is, all the ground-level management stuff that must happen before, during and after the glitzy work of digging into the data lake in search of answers.

Stormy data

Perfect storms form when, at long last, all the requisite ingredients needed to drive a new market skyward are not only available, but also mature enough to scale. That sounds a lot cleaner than what happens in real life, of course. Because the individual sections of scaffolding rarely develop at the same pace. Worse, it’s sometimes difficult to see what’s ready and what’s not without deploying what’s available in the wild.

Which, as we all know, is exactly what happened during the pandemic.

“We were already headed this way,” JG Chirapurath, VP of Azure Data, AI and Edge at Microsoft, said. “But when the pandemic shut everything down, the evolution of using data to unlock outcomes became more of a revolution.”

To be sure, the shutdown effectively scrapped existing models and dashboards, rendering finely tuned supply, demand and cost forecasts worthless. Enterprises suddenly were flying blind. They urgently needed new systems to operate in a reality that only just started taking shape in March 2020.

That sent businesses scurrying in search of new data sources, giving rise to the demand for hybrid and multi-cloud. Many companies, for example, folded infection rates, pandemic sentiment trends and other COVID-related information into their budding new models. At the same time, some large, established organizations found cost-cutting religion during the shutdown, and pulled stable, regular workloads back from the cloud.

“If 100 percent of workloads were really going into the public cloud, then there would be no Google Anthos, AWS Outpost or Azure Arc,” Mick Hollison, President of Cloudera, said of the three major cloud vendors’ hybrid and multi-cloud offerings. “What enterprise customers want is superior economics, and sometimes that’s still on-prem, as much as nobody in the cloud wants to hear that.”

The Denodo survey, in fact, supports that. While the number of workloads exploded everywhere, private cloud actually took share not only from public cloud, but hybrid and multi-cloud as well. More than 24 percent of workloads were deployed in private cloud, up markedly from 16.6 percent a year ago, according to the survey. Hybrid cloud still dominates, hosting 35.8 percent of workloads.

Taken to the limit

All of this stretched many teams far beyond their comfort zones. They were accustomed, for example, to models built around a single companion dataset. So the added complexity of new resources with unfamiliar structures stationed in faraway exchanges ratcheted the challenge to dizzying heights.

Add to that the challenge of managing access for a new wave of employees across disparate business units and corporate departments who never before cared about what was in that data silo, and you have the makings for a digital transformation nightmare.

“I had one CEO tell me, ‘you know, we’ve had to modernize just about everything because we never planned for a flash flood like this,’” Microsoft’s Chirapurath said.

With that seemingly overwhelming task ahead, it’s not hard to see why high-flying, get-insight-quick point products proved for some to be too compelling to ignore. Unfortunately, AI without the requisite cloud data platform in place can create more problems than it solves.

“There’s an expectation that you can just apply technology and create an outcome,” said Red Hat’s Wright. “I call it ‘magic happens here.’ The reality is that it’s much harder than that. I worry that customers who start out with the assumption that it’s going to be easy will be disappointed.”

A platform by any other name

Many companies have different terms for what I’ve termed the cloud data platform. Oracle, for example, labels it the “enterprise data management cloud.” Nutanix uses the term “enterprise cloud.” And Cloudera, which offers a platform called the Cloudera Data Platform, actually calls the category the “enterprise data cloud.”

“The enterprise data cloud is incredibly important to regulated verticals like banking, telcos, life sciences and government,” Cloudera’s Hollison said. “And they don’t want, for example, to have a bespoke security and governance model for each individual analytic function.”

The structure imposed on regulated organizations by, well, regulations benefited them last year, when they needed to grow their universe of data sources. But for those without a common structure to help engineers prepare and manage data from two related but separate silos found themselves wholly unprepared for the task.

For them, part of the obstacle was that, almost by default, an enclosed model with its own dedicated dataset comes with all the data preparation and engineering, security, governance and MLOps it needs. So they had nothing in place – and, until a year ago, no motivation to install structure to ensure two internal datasets are compatible.

As a result, there’s not a common structure to help engineers prepare and manage the data from two silos to serve a new, broader exploration effort. Without that structure, the arduous data preparation work by default benefits only that one project. So engineers are doomed to face the same Herculean feat ahead of the next project.

A model for building models

All of this is to say that there is a crying need for a cloud data platform to be in place to ensure robust, standard, repeatable and reusable efforts. A model for building models, if you will. It may not be glamorous. But it’s an essential piece of the scaffolding that needs to be in place before the coming perfect storm can roll into the datacenter.

“That base has been really critical, not just within functional groups but across the company,” said New York Life’s White. “Many companies have jumped on the bandwagon, implementing flashy things, but have not built that foundation. Making the investment without first getting your foundation in order is like building a new kitchen when you’ve got water in the basement. Sooner or later, your shiny new kitchen is going to crumble.”

Mike Feibus
Contributor

Mike Feibus is President and Principal Analyst at FeibusTech, covering enterprise client technology, corporate health and wellness tech, augmented and virtual reality (AR/VR), connected car, privacy & security. He is a longtime columnist for IT and general-interest technology publications, including USA Today, Fortune, Information Week and EE Times. Feibus earned an MBA from Stanford and a BA in Economics from Tufts.

The opinions expressed in this blog are those of Mike Feibus and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

More from this author