By Per Kroll, Senior Director, R&D for DevOps, AIOps and Automation, Broadcom
In order to compete, enterprises must deliver digital experiences that perform flawlessly in seconds and delight their customers. The challenge is that behind each digital app is massive complexity – hundreds of interwoven services, developed by a mix of proprietary and open source contributors, connecting mainframe and multi-cloud infrastructures with billions of end user devices. These digital demands are driving exponential growth of data and transaction volumes, making it increasingly difficult for IT operations to keep up. There is too much data, complexity, and change for a finite number of operators to keep up, even with the assistance of leading monitoring and diagnostic technologies.
Artificial intelligence for IT operations (AIOps) is helping leading organizations conquer these challenges. AIOps combines big data and machine learning (ML) algorithms to augment and automate day-to-day IT ops tasks ranging from performance monitoring and reporting to data correlation and analysis. Applying AI to IT ops is needed to keep up with the ever-increasing volume of data, which is doubling year over year. AIOps enables you to quickly analyze this data, present it in a meaningful way, and use it to proactively anticipate and resolve common IT issues.
It can seem daunting at first, but here are five key elements of a pragmatic approach to implementing AIOps:
Transform Siloed Data into Contextual Insights for Faster Decision Making
Data is the key to AIOps. A broad and rich set of data collectors is needed to feed the ML algorithms. The difficulty is that data resides in multiple domains across hybrid IT environments, all of which are typically siloed. To be useful, data needs to be visualized and analyzed in context – enabling you to act upon insights derived from the data more efficiently and effectively. We call this contextual insights.
These meaningful and actionable insights are based on information gathered and analyzed across your entire IT environment, helping you understand how one piece of data is connected to the next and how important the insight is to the business. Contextual insights form the foundation for AIOps, making it possible to achieve robust collaboration, anticipate problems sooner, enable operational automation, and identify strategic priorities.
This begs the question: how can siloed data be transformed into contextual insights? The answer is not found in a single tool, or even in a selection of tools. Modern operational tools tend to go either broad or deep – not both. There are tools that provide a broad but thin layer of visibility into what is going on acrossIT domains, such as such as infrastructures, networks, storage devices, databases, and applications. There are also tools that deliver deep but narrow visibility into specific domains and platforms. But while these tools provide great diagnostic value, they are not sufficient in themselves to break down data silos.
The reality is there is no one tool that does it all. Silos can only be broken down by embracing an open architecture that enables you to integrate curated data from across the entire hybrid technology stack – from mobile to mainframe to multi-cloud. This does not mean taking all your raw data and throwing it into one gigantic data lake. If you do, you will end up with a data swamp: a stagnant pool clogged with untold amounts of useless data … useless because of the difficulty involved in putting random and disparate data in context to create insight and drive action. Instead, using an open architecture approach, you can easily augment and further curate the data with meaning behind the data relationships found across the tooling landscape.
Analyze Across Domains to Increase Operational Efficiencies
When your IT ops tools embrace the use of open APIs to gather analytics, you gain the ability to view your curated data from different perspectives and share hidden insights across teams, thereby achieving greater efficiencies. You don’t need to rip and replace all your IT Ops tools to begin gaining the benefits of AIOps. Instead you can build on your current investments in products by leveraging open APIs to integrate the data you are already collecting.
For example, suppose you have a situation where a network is running too slowly. Data visibility from one tool may limit you to what is happening within your network environment. However, if the slowdown was being caused by activities in a storage device, the root cause of the issue would remain unclear and just out of view. The ability to visualize in depth data in context across domains reveals what is taking place and why it is happening, enabling teams to work together cross-functionally to resolve issues swiftly.
The greatest benefits of contextual insights come when IT teams make collaborative analysis the new normal, maintaining constant awareness of activities that span mainframe, distributed, and cloud infrastructures. Data sharing and faster analysis across IT domains increases productivity and efficiencies for everyone, enhancing business outcomes.
Leverage Proactive Insights to Move from Recovery to Avoidance
In addition to supporting cross-functional analysis, contextual insights allow data to be mined for patterns via machine learning. These patterns enable IT to anticipate potential issues sooner and shift from a reactive recovery model to a proactive avoidance model.
Siloed data makes proactive operations nearly impossible. Too much data is missing from the equation. But when data is synthesized and analyzed from multiple environments, and is combined with human expertise, the proactive analysis takes on higher levels of accuracy. Potential problems or abnormal trends can be identified early enough to remediate issues before they impact the business.
Using AIOps to generate proactive insights can also help address IT skills gaps. For example, mainframe operations have the benefit of people with decades of experience. However, these people are retiring and taking their tribal knowledge with them. AI and ML can be used to collect and codify this knowledge so that it is not lost. That knowledge will then contribute to proactive insights that the next generation of operators can use to keep business critical operations running smoothly.
Use Automation to Advance Towards Self-healing Systems
Once AIOps is providing accurate insights in context, it is just one more step to have AIOps act upon those insights, remediating issues automatically before they impact the business. This is the ideal state: AIOps sends an alert as soon as an abnormal trend or possible issue is identified, quickly isolates the problem and diagnoses the source, and automates an appropriate response. No human intervention required.
Such automation does not happen all at once. It is best to automate slowly and methodically as you build your AIOps structure, starting with simple tasks and working up to more complex actions. For instance, you might want to automate reallocation of storage on demand to improve performance, optimize capacity to save on costs, or temporarily expand an MQ queue based on a workload spike.
If even “simple” automation sounds daunting or time-consuming, remember that machine learning can help dramatically. For example, instead of writing and maintaining hundreds or even thousands of lines of code to detect if your system is trending out norm, ML algorithms, trained using your curated data, can detect these patterns with just a few lines of code. Initially, the automations you put in place may only save you five minutes here or ten minutes there. But, minutes quickly add up to hours and days that can be spent on other value-added tasks. Automation can also help improve the stability of your system by enabling rapid corrections from an undesired to a desired state, preventing issues from happening in the first place.
By using an incremental AIOps approach that incorporates built-in feedback loops and guardrails, you can establish “trusted automation” over time. Your IT operations can be shifted so that your personnel no longer have to handle routine matters, manage policies, or resolve the majority of issues that arise: all that will run on autopilot under the auspices of AIOps.
Prioritize Investments to Optimize ROI
As we have seen, contextual insights are the foundation for analyzing, anticipating and automating your operations. It also serves as the foundation for prioritizing.
Managing enterprise IT operations requires a constant balancing act. You have to juggle the competing objectives of keeping the lights on, saving costs, and driving innovation … all with limited people, skills, time, and money. How do you identify what is most important? How do you determine how to allocate your resources?
In the absence of contextual insights, it is easy to fall into two opposing traps. The first trap is to debate matters endlessly but actually do nothing. The second is to make poor decisions based on opinion or emotion rather than fact. Fortunately, AIOps can support strategic decision-making, helping you prioritize your investments to optimize ROI.
Contextual insights allow you to compare opportunities, weigh trade-offs, assess vulnerabilities, and predict outcomes. For example, capacity data enables you to detect if you are optimizing usage and based on your system data you can calculate and compare cost savings and performance improvement opportunities. Ultimately, AIOps helps you prioritize by enabling you to use your data to determine your best path forward to achieve your desired business outcomes.
Building an AIOps Powerhouse
AIOps is not as complex as it may first appear. You begin with a foundation of contextual insights, set in place by opening up your architecture and enabling contextual insights to be derived from previously siloed data. On that foundation, you raise four pillars. First, you begin to analyze across domains, sharing data to enhance organizational efficiency. Second, you leverage ML-driven proactive insights to anticipate potential problems sooner, making the shift from reactive to proactive operations, preemptively addressing issues instead of responding to business impacts. Third, you continuously augment your operations and proactive abilities with automation you can trust as you drive towards implementing self-healing systems. Fourth, you use data-driven prioritization to generate a sustainable flow of value to the enterprise.
An AIOps powerhouse is not built overnight. By beginning today and continuing to make incremental changes, you will advance steadily toward your goal of an optimized, self-sustaining, self-healing hybrid IT infrastructure that supports your business flawlessly every day.
Our team at Broadcom would welcome the opportunity to help you get started today – click here to learn more.
About the Author:
Per manages global R&D for Open Mainframe, DevOps, AIOps and Automation product portfolios for Broadcom, Mainframe Division. He is passionate about helping development and IT operation organizations increase innovation and transform to support the digital enterprise.