In the CIO.com article, Take Your Analytics into Hyperdrive in 2021, I predicted the four big data analytics trends of the year and what enterprises must do to stay competitive in a modern, hybrid world that spans from the edge to the cloud. I explained how Apache Spark would continue to dominate the big data world, as analytic users like data scientists and data engineers want to tap into existing data stores without having to move to the cloud or re-platform the data. I also stated that since the edge is here, companies would have to solve for both data and Zero Trust security to ensure that they are using the edge effectively and securely. Over the past year, all these predictions have come true as 2021 marked an accelerated move toward the edge and a continued prioritization around data.
In 2022 and beyond, I see a continued focus on data-first modernization — companies will focus first on what happens to the data rather than thinking about the legacy tech and how they used to process data. With a data-first approach, companies will focus on pulling all data together from their data silos. This will allow them to access data, tap into untapped value, and take advantage of the edge while minimizing the migrations impact.
As the data analytics landscape continues to evolve over 2022, here are my four data-first predictions for the new year.
1. The edge will take a dominant place in data architecture
Moving forward, the most important journey of data is not from the cloud to the edge, but rather in the other direction — from the edge to the cloud, or what I call edge-in. A true edge solution must encompass every aspect of the edge, from data collection to analytics, that can then be brought to the data center and/or cloud, rather than vice versa. Over the coming year, the edge will continue to grow up and take its rightful place as the epicenter of data creation, analytics, and insights. With over 50% of data existing at the edge by 2023, the importance of doing work at the edge will become commonplace as organizations continue to tap into immediate insights.
As always, you need to follow your data for your own unique solutions, but I’m already seeing rapid change in analytics architectures to an edge-first distributed mindset. Many organizations are no longer following the prevailing data center or cloud-out design principles where data is first brought into a centralized location and insights are then pushed back out.
Look to leverage the latest cloud native techniques like Kubernetes that are available and deployable in tiny footprints at the edge. Furthermore, the ability to do this work at the edge is now supported in a way it has never been before with more compute power in smaller forms. We now can effectively put the equivalent of a supercomputer (a single GPU can give you >6000 cores) in the tiniest of edge solutions, allowing you to push more analytics to the data. This shift toward smaller footprints for all technology will only continue
2. The data fabric will unify data from edge to multi-cloud
Data-first modernization will also change the way priorities are set for building distributed data platforms. Just as most large companies have an enterprise-grade CRM, ERP, and HR platform, in 2002 an enterprise-grade data fabric will now be standard. A true data fabric should provide secure and cohesive data access from your primary data estates spanning edges, core data centers, and out to multiple clouds. It should be performant for analytics, scalable, and flexible — supporting a multitude of data types and APIs. It should also provide the necessary enterprise-grade data governance and observability to simplify and instill confidence.
Data fabric solutions have matured and been proven by the innovators and early adopters. Specifically, look for trusted data fabric technologies that leverage a global namespace and support files, blocks, streams, objects, and all the common data access APIs for multi-use. Note: This is not to be confused with a data virtualization layer, which is a nice compliment as you will always want to bring in rogue data sources. A production-grade data fabric should be at the heart of your data estate.
3. The data lakehouse will become the default analytics platform
With the unification of your data, it’s a natural progression to see to the unification of users and analytic techniques. Hats off to our friends at Databricks and the analyst community who led the way with marketing this trend as a lakehouse or data lakehouse.
As data lake and data warehouse techniques have started to blend, I’m continuing to see more offerings that can deliver across the entire spectrum for code-first data scientists, citizen data scientists, and business analysts. With a fully functioned data lakehouse, companies will no longer need separate systems for business analysts using SQL and data scientists using Apache Spark or Python.
Flexibility and agility are paramount when investing in your modern data lakehouse. Look for open architectures that play nicely with variety of ISVs and cloud venders, so you won’t be locked into any proprietary stacks. Native integrations of both Spark and SQL are a plus as they will guarantee analytic performance with both first-class operators. And look for solutions that either have a data fabric embedded or can easily sit on top of one.
4. Data-first modernization trends will change the cloud experience dynamic
The cloud has become quite popular, as it gets you out of the IT business, provides self-service and agility to quickly start/stop/scale, and comes with low, upfront starting costs. But the traditional cloud model is flawed as it is a cloud-out approach and leaves you to fend for yourself when it comes to roughly 75% of your data estate. The data-first modernization trends I predicted above, the unification of analytic capabilities, and data with an edge-in mentality will most certainly impact the as a service model as well.
In 2022, companies will wake up and recognize they are not getting an end-to-end cloud experience if they still do most of the work themselves to make data useful between the edge, their data centers, and the cloud. We’re going to see more and more companies ask for a consistent cloud experience for the entirety of their solution — not just for the portion they have in any one public cloud.
When looking for a cloud experience, find a cloud delivery provider that can deliver a comprehensive, end-to-end experience that comes to your data, wherever it resides. The ideal vender will provide one IT operating model to orchestrate across edges, co-locations, data centers, and multi-cloud. And they will still deliver the benefits of self-service, pay-as-you-go models, and the ability to scale up and down, ensuring price/performance as your needs change.
Check out the following reports to learn more about data analytics, application modernization, and the business value of HPE Ezmeral Data Fabric.
About Matthew Hausmann
Matt’s passion is figuring out how to leverage data, analytics, and technology to deliver transformative solutions that improve business outcomes. Over the past decades, he has worked for innovative start-ups and information technology giants with roles spanning business analytics consulting, product marketing, and application engineering. Matt has been privileged to collaborate with hundreds of companies and experts on ways to constantly improve how we turn data into insights.