Data & analytics: The next phase of digital transformation

BrandPost By Matt Maccaux
Apr 19, 2021
AnalyticsData ManagementIT Leadership

gettyimages 760255537 1600 0 72 cr
Credit: Getty Images

In this two-part series, I explore the two phases of digital transformation that many organizations are undergoing. In part one, I dig into what organizations have done in the first phase of transformation and why they must think differently as they embark on the second phase. In part two, I describe how organizations should approach the second phase of transformation in order to successfully transform their data and analytics estates – with Spark as the foundation of those changes.

Conquering the last frontier of digital transformation

As I talk to my clients in organizations of every size and industry, I sense a generational shift in both their technology and business strategy in the area of advanced analytics. I define advanced analytics as the exploitation of an organization’s data assets through sophisticated data science tools and techniques performed by data scientists. Digging further, we can see that this isn’t traditional business intelligence and reporting using legacy and modern reporting tools (such as QlikView, Tableau, and Power BI.) No, this sort of analytics is often ad hoc, using bespoke combinations of tools, libraries, and analytical techniques against many types of data types and sources.

Many organizations are using advanced analytics now because they have completed the first few phases of their digital transformation projects and are moving on to the last frontier – tackling the data and analytics systems and processes to fully transform.

What exactly does that mean, though? It means that the analog-to-digital transformations are complete. It also means that the traditional IT environments have been transformed to be more efficient and services-driven and applications are now using cloud technologies and operating models. That leaves the data and analytics components where value can still be extracted and exploited. The question we must ask is, “How do we bring those learnings from application modernization, tooling from the DevOps processes, and operating models from the cloud to the data and analytics estate?” The answer lies in lessons learned from existing application modernization efforts.

Lessons learned from the first phase of transformation

  • Application modernization  Application modernization includes new software development methodologies, tools, and processes coupled with a change in organizational structures and processes to be software driven. New programming languages have emerged to make writing, testing, and deploying code more assessable to software teams. That has allowed the lines of business within organizations to better understand software development and align more closely with it; this enables better integration with traditional IT, letting them become technology-driven business units. Those changes didn’t happen over-night – but when completed, I have seen improvements that are orders of magnitude more efficient and impactful than previous technology deployments.
  • DevOps processes  Writing better code using public cloud tooling is only part of what has made the recent digital transformations effective. DevOps has accelerated these transformations, which have been instrumental in breaking down the barriers between application development and IT operations. With that problem solved, organizations were able to truly start using IT as a force-multiplier additive to their digital transformation. Organizations that have a “DevOps mentality” are poised for success in the next phase of their transformation.
  • Operating models  The public cloud has transformed the operating models of many organizations in many ways. From the way IT departments extend their own capabilities through hybrid-cloud initiatives to the way application developers use cloud-native services and functions – organizations have continued to increase their business velocity by embracing cloud principles and operating models. OpEx vs. CapEx, self-service, on-demand provisioning, elastic scaling, micro-charging, and bespoke provisioning of resources are all game-changing practices that have transformed the way organizations treat technology.

Data and analytics require a different way of thinking

The second phase of digital transformation for most organizations will be data and analytics focused. Best of breed organizations will apply the best practices from their application modernization transformations to this phase. Data and analytics is different enough, however, that it requires slightly different thinking, tooling, and approaches while keeping those patterns in mind to be truly successful. Let’s look at why the data and analytics space is different so that we can understand what must be done differently than what was done for application modernization.

In the data and analytics space, software development generally falls under two categories: data engineering and data analytics/data science. Up until recently, these developers worked on tooling and systems that are 10+ years old using languages and environments that are sometimes even older. That is because these systems are part of critical business reporting and intelligence functions that are slow to change because the business doesn’t need them to change. Therefore, these systems are treated with a light touch and are only changed with the utmost care. Writing software within the organization against these systems is almost always done using highly controlled development and operational processes that are slow to change with few iterations.

Looking past those traditional reporting systems, big data systems have evolved to integrate with more modern software development and languages, but the deployment of code and applications against them has still been rigid because these systems are often deployed as monoliths (i.e.  the individual components of the system are tightly coupled and have to be updated and deployed at the same time). That means that DevOps tooling and processes are incompatible with these systems and therefore they are unable to benefit from agile techniques, continuous integration, and delivery tooling.

In most organizations I speak to, most of these data and analytics systems are anywhere from hundreds of terabytes to exabyte scale. And in these systems, the data is usually tightly coupled to the applications, forming enormous monoliths making cloud deployments, at best impracticable and in most cases, impossible. That leaves public cloud deployments of these systems impossible due to cost concerns, network latency issues, and legal/regulatory policies that forbid those deployments. Being unable to deploy these systems into the public cloud means that the benefits mentioned above are not possible and therefore we can’t just apply those principles and operating models without rethinking our approach.

Apply patterns learned along with new tools, techniques, and processes

In reading this article, my hope is that you’ve gained insight into the challenges that organizations face as they embark on the next phase of digital transformation. As you have read, there are lessons we have learned from the recent application modernization transformations. Although we can’t take all of the tooling, principles, and processes and apply them to the data and analytics estate, we can take those patterns and apply new software development tools, techniques, and processes, then apply cloud principles and operating models to drive that transformation.

In the next article I will go into the best practices for the digital transformation of data and analytics systems and organizations and provide advice on how to accelerate that journey. The focus will be on how the open-source solution for Spark is the common technical component enabling and accelerating the data and analytics digital transformation.

____________________________________

About Matt Maccaux

matt
As Global Field CTO for HPE Ezmeral software, Matt brings deep subject-matter expertise in big data analytics and data science, machine learning, application development & modernization, and IoT as well as cloud, virtualization, and containerization technologies.