Credit: shutterstock In the technology and data analytics space, I’m continually reminded that the only constant is change. This industry loves to innovate. Time and again we innovate to overcome immediate and future challenges – with solutions that address the need for more data, faster analytics, and better architecture. The innovation typically follows a trajectory of something ground-breaking, followed by years of incremental improvements that mature the offering and make it applicable to the masses. While these incremental changes are usually easy to incorporate, the problem is we have to implement the ground-breaking innovation first. This transition usually requires process changes, training, re-architecture, and a long, painful migration. Ultimately, this leads to the technology hype cycles, where businesses individually assess when or even if the risk and struggle to make a change is worth the rewards. Looking back…a little perspective Hadoop is a great example of both sides of this phenomenon. Several years ago, Hadoop was the new innovation on the block. In the early 2010s, it came in fast and furious as the enterprise data warehouse (EDW) killer. Although Hadoop’s messaging and immature technology created confusion for many enterprises, some early adopters cut their teeth on it and made it work. Over the years, the technology matured to the point that (nearly) everyone had a Hadoop-based data lake running in their data centers. SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe Fast forward to 2020, and your business-critical analytical applications depend on Hadoop – but now it is on the other end of the technology cycle. The Hadoop ecosystem chugged along and evolved over the past decade, but several new technology innovations occurred in the meantime. The time has come to embrace these new innovations – and modernize your big data estate. 4 major technology developments Four must-have technology developments impact the big data information estate for enterprises today: Containerization and Kubernetes are game changers Containers (and Kubernetes orchestration) can deliver a lot of benefits for big data environments. With containers, you can separate compute and storage. This capability lets you right-size your solution, drive greater efficiency, and optimize the utilization of your compute. Containers also allow you to embrace the constantly evolving ecosystem of open-source tools, enabling data analysts and data scientists to spin up their tools of choice in minutes, while getting access to the data they need. Plus, you get application portability, flexibility, and agility, meaning you can quickly and easily deploy data-intensive apps on premises or in any cloud. Data is everywhere – on prem, hybrid cloud, multi-cloud, and at the edge Originally, the big data estate for most enterprises was planted firmly on premises. But more apps are being deployed in the public cloud and often on multiple public clouds. And with the ever-increasing volume of data generated at the edge (together with network improvements), you need to think about your data globally — from edge to cloud. Your next big data platform should adapt to the needs of your business and data everywhere. And it must be flexible to accommodate on premises, hybrid cloud, multi-cloud, and edge computing deployments. The open-source ecosystem continues to evolve Enterprises need to future-proof their big data investments. Over time, some vendors have focused on the pure open-source model. Others have provided value-add commercial software built on open-source technology. Turns out both approaches are right. You’re going to want optimized tools from your solution provider when it makes sense, but your future big data estate also needs to evolve with the speed of open-source innovation. By implementing a solution with the ability to deploy any open-source framework, you are prepared for this constant evolution, while giving your data scientists access to the latest open-source toolkits. Make the infrastructure invisible – while ensuring performance, resiliency, security, and high availability I remember a comment a CTO made to me several years ago. When arguing a point about how to improve performance of data lakes, he said, “You’re all about infrastructure; we don’t care about the infrastructure.” I’ve since embraced this mantra (after all, data science teams don’t want to have to worry about the underlying storage, compute, and networking), but infrastructure is still important. We can hide the complexity of the infrastructure, making app deployment as easy and as seamless as possible. But if you don’t architect your solution to ensure security, performance, and other enterprise-grade requirements, it won’t make it in production. And ultimately, it won’t deliver business value. Is the risk worth the reward? Hadoop distributions are fighting to stay relevant, but data platform and deployment alternatives have emerged. Many enterprise organizations are re-evaluating their path forward, embarking on a new strategy to modernize their big data estate. So, now is the time to ask the difficult questions: Am I currently getting the value I was expecting from my data lake? What extra value do I get when I upgrade? What will the integrated solution look like? What features/apps will still be there? What is the roadmap? Will it change if my distribution is acquired? Do I have to upgrade? How do I do it? How long will it take? How much will it cost? When do I lose support on my current version? Will I be locked into my distribution’s proprietary apps? How easy is it to bring in the latest open-source tools that my data science teams want? Is Apache Ozone ready for primetime? Should I trust it with my data? Is the risk worth the reward or should I consider another strategic solution (and another strategy partner) to modernize my big data estate? Hewlett Packard Enterprise can help Hewlett Packard Enterprise (HPE) knows first-hand that enterprise organizations – and their business-critical, data-intensive applications – are caught in this storm of uncertainty and change. We’ve recently gone on our very own modernization journey to fulfill our vision of a data-driven business. Our new elastic data analytics solution leverages containers, the latest hardware, and open-source toolkits to bring speed and agility to our decision making and empower our worldwide users from edge to cloud. Unfortunately, there’s no easy button, since each organization has its own requirements. But HPE can help customers navigate this process. HPE has a complete portfolio of solutions, expertise, and support to help modernize your big data estate. To de-risk the modernization process, we created the HPE AMP Assessment Program to help clients answer the difficult questions about their big data information estate. With this offering, HPE will: Analyze your current-state platform, provide a detailed Map to modernize your current platform in a way that will meet the business needs of your organization, and finally, Prescribe a systematic plan to get you there. And as the output from the AMP Assessment, HPE can leverage its entire arsenal from HPE Ezmeral software, world-class hardware, and proven services to deliver the right solution for your specific needs. To learn more If you’re looking for straight talk on this topic, I recently hosted Modern Big Data Solutions Roundtable with three of our Big Data CTOs. Please listen in as we address relevant topics such as upgrade/migration paths, Kubernetes and containerization, open-source options and tools, and hybrid cloud/multi-cloud deployments. ____________________________________ About Matthew Hausmann Matt’s passion is figuring out how to leverage data, analytics, and technology to deliver transformative solutions that improve business outcomes. Over the past decades, he has worked for innovative start-ups and information technology giants with roles spanning business analytics consulting, product marketing, and application engineering. Matt has been privileged to collaborate with hundreds of companies and experts on ways to constantly improve how we turn data into insights. Related content brandpost How ML Ops Can Help Scale Your AI and ML Models Machine learning operations, or ML Ops, can help enterprises improve governance and regulatory compliance, automation, and production model quality. By Richard Hatheway Apr 07, 2022 7 mins Machine Learning IT Leadership brandpost Edge Computing is Thriving in the Cloud Era Todayu2019s edge technology is not just bolstering profits, but also helping reduce risk and improve products, services, and customer experience. By Denis Vilfort, Al Madden Apr 06, 2022 11 mins Edge Computing Artificial Intelligence IT Leadership brandpost 5 Types of Costly Data Waste and How to Avoid Them Poor choices in data infrastructure and data habits can lead to data waste u2013 but a comprehensive data strategy can help resolve the problem. By Ellen Friedman Mar 29, 2022 11 mins Data Center Management Data Architecture IT Leadership brandpost 2022 is the Year of the Edge By Matthew Hausmann Feb 28, 2022 9 mins Data Science Edge Computing IT Leadership Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe