Credit: Getty Images In this two-part series, I explore the two phases of digital transformation that many organizations are undergoing. In part one, I dig into what organizations have done in the first phase of transformation and why they must think differently as they embark on the second phase. In part two, I describe how organizations should approach the second phase of transformation in order to successfully transform their data and analytics estates – with Spark as the foundation of those changes. Conquering the last frontier of digital transformation As I talk to my clients in organizations of every size and industry, I sense a generational shift in both their technology and business strategy in the area of advanced analytics. I define advanced analytics as the exploitation of an organization’s data assets through sophisticated data science tools and techniques performed by data scientists. Digging further, we can see that this isn’t traditional business intelligence and reporting using legacy and modern reporting tools (such as QlikView, Tableau, and Power BI.) No, this sort of analytics is often ad hoc, using bespoke combinations of tools, libraries, and analytical techniques against many types of data types and sources. Many organizations are using advanced analytics now because they have completed the first few phases of their digital transformation projects and are moving on to the last frontier – tackling the data and analytics systems and processes to fully transform. SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe What exactly does that mean, though? It means that the analog-to-digital transformations are complete. It also means that the traditional IT environments have been transformed to be more efficient and services-driven and applications are now using cloud technologies and operating models. That leaves the data and analytics components where value can still be extracted and exploited. The question we must ask is, “How do we bring those learnings from application modernization, tooling from the DevOps processes, and operating models from the cloud to the data and analytics estate?” The answer lies in lessons learned from existing application modernization efforts. Lessons learned from the first phase of transformation Application modernization Application modernization includes new software development methodologies, tools, and processes coupled with a change in organizational structures and processes to be software driven. New programming languages have emerged to make writing, testing, and deploying code more assessable to software teams. That has allowed the lines of business within organizations to better understand software development and align more closely with it; this enables better integration with traditional IT, letting them become technology-driven business units. Those changes didn’t happen over-night – but when completed, I have seen improvements that are orders of magnitude more efficient and impactful than previous technology deployments. DevOps processes Writing better code using public cloud tooling is only part of what has made the recent digital transformations effective. DevOps has accelerated these transformations, which have been instrumental in breaking down the barriers between application development and IT operations. With that problem solved, organizations were able to truly start using IT as a force-multiplier additive to their digital transformation. Organizations that have a “DevOps mentality” are poised for success in the next phase of their transformation. Operating models The public cloud has transformed the operating models of many organizations in many ways. From the way IT departments extend their own capabilities through hybrid-cloud initiatives to the way application developers use cloud-native services and functions – organizations have continued to increase their business velocity by embracing cloud principles and operating models. OpEx vs. CapEx, self-service, on-demand provisioning, elastic scaling, micro-charging, and bespoke provisioning of resources are all game-changing practices that have transformed the way organizations treat technology. Data and analytics require a different way of thinking The second phase of digital transformation for most organizations will be data and analytics focused. Best of breed organizations will apply the best practices from their application modernization transformations to this phase. Data and analytics is different enough, however, that it requires slightly different thinking, tooling, and approaches while keeping those patterns in mind to be truly successful. Let’s look at why the data and analytics space is different so that we can understand what must be done differently than what was done for application modernization. In the data and analytics space, software development generally falls under two categories: data engineering and data analytics/data science. Up until recently, these developers worked on tooling and systems that are 10+ years old using languages and environments that are sometimes even older. That is because these systems are part of critical business reporting and intelligence functions that are slow to change because the business doesn’t need them to change. Therefore, these systems are treated with a light touch and are only changed with the utmost care. Writing software within the organization against these systems is almost always done using highly controlled development and operational processes that are slow to change with few iterations. Looking past those traditional reporting systems, big data systems have evolved to integrate with more modern software development and languages, but the deployment of code and applications against them has still been rigid because these systems are often deployed as monoliths (i.e. the individual components of the system are tightly coupled and have to be updated and deployed at the same time). That means that DevOps tooling and processes are incompatible with these systems and therefore they are unable to benefit from agile techniques, continuous integration, and delivery tooling. In most organizations I speak to, most of these data and analytics systems are anywhere from hundreds of terabytes to exabyte scale. And in these systems, the data is usually tightly coupled to the applications, forming enormous monoliths making cloud deployments, at best impracticable and in most cases, impossible. That leaves public cloud deployments of these systems impossible due to cost concerns, network latency issues, and legal/regulatory policies that forbid those deployments. Being unable to deploy these systems into the public cloud means that the benefits mentioned above are not possible and therefore we can’t just apply those principles and operating models without rethinking our approach. Apply patterns learned – along with new tools, techniques, and processes In reading this article, my hope is that you’ve gained insight into the challenges that organizations face as they embark on the next phase of digital transformation. As you have read, there are lessons we have learned from the recent application modernization transformations. Although we can’t take all of the tooling, principles, and processes and apply them to the data and analytics estate, we can take those patterns and apply new software development tools, techniques, and processes, then apply cloud principles and operating models to drive that transformation. In the next article I will go into the best practices for the digital transformation of data and analytics systems and organizations and provide advice on how to accelerate that journey. The focus will be on how the open-source solution for Spark is the common technical component enabling and accelerating the data and analytics digital transformation. ____________________________________ About Matt Maccaux As Global Field CTO for HPE Ezmeral software, Matt brings deep subject-matter expertise in big data analytics and data science, machine learning, application development & modernization, and IoT as well as cloud, virtualization, and containerization technologies. Related content brandpost How ML Ops Can Help Scale Your AI and ML Models Machine learning operations, or ML Ops, can help enterprises improve governance and regulatory compliance, automation, and production model quality. By Richard Hatheway Apr 07, 2022 7 mins Machine Learning IT Leadership brandpost Edge Computing is Thriving in the Cloud Era Todayu2019s edge technology is not just bolstering profits, but also helping reduce risk and improve products, services, and customer experience. By Denis Vilfort, Al Madden Apr 06, 2022 11 mins Edge Computing Artificial Intelligence IT Leadership brandpost 5 Types of Costly Data Waste and How to Avoid Them Poor choices in data infrastructure and data habits can lead to data waste u2013 but a comprehensive data strategy can help resolve the problem. By Ellen Friedman Mar 29, 2022 11 mins Data Center Management Data Architecture IT Leadership brandpost 2022 is the Year of the Edge By Matthew Hausmann Feb 28, 2022 9 mins Data Science Edge Computing IT Leadership Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe