In five short years, artificial intelligence for IT operations (AIOps) has evolved from a futuristic concept to a standard practice for enterprises that place a high value on getting ahead of the break-fix model of IT support.\nAIOps proposes a solution for several sources of stress that IT operations (ITOps) face today. IT environments are becoming too complex to operate manually. The breadth of technology ITOps needs to embrace is exponentially increasing. Computing power is moving outside the data center, to the edges of the network and infrastructure problems must be addressed at ever-increasing speeds. Rather than try to outrun these trends, enterprises are throwing automation at the problem \u2013 using big data analytics, machine learning, and other AI technologies to help identify and resolve IT issues. Enter AIOps.\nAIOps typically has four basic steps: monitoring, analysis, recommendation, and remediation. While monitoring and remediation are important bookends, the middle two steps \u2013 analysis and recommendation \u2013 form the key components IT support providers need to master to execute a successful AIOps strategy.\nThe objective is to identify emerging issues and apply corrective action to the customer base. This reduces the time it takes from fault identification to resolution, not just for the customers who have the problem, but for all other customers who may be at risk \u2013 or have yet to identify the issue themselves.\nHowever, given the pace of change in a typical ITOps environment, IT support providers need a continual improvement cycle that can adapt real-time based on factual experience to create a successful approach to addressing customer challenges.\nTo achieve this, service providers must identify best practices, through recommendation adoption, deviation identification, and ultimately the definition of \u201cknown good.\u201d With a broad customer base offering recommendations, potential issues can be identified along with operational behaviors that improve IT efficiency. These elements then become the basis for preventative recommendations.\nStarting the process\nThis continual improvement cycle starts at the crossroads of product engineering and support, focused through the lens of elevated case management. To be successful it needs to prioritize, identify, and eliminate issues that require human intervention.\nTo do this, service providers need to have effective telemetry monitoring, dashboarding, and data analytics capabilities to track those trends. Strong product engineering, support engineering, and data science teams are required to analyze telemetry at scale to identify new threats, prioritize them, refine rapid diagnostic capabilities, and isolate causation. AI tools assist with the volume of data to drive accuracy, ultimately allowing the predictive identification problems before they can cause disruption to the customer. Customers can then be given remediation steps to solve the problems prior to significant disruption to their environment.\nThis starts to outline the components of a continual improvement cycle. Successful service providers need to constantly do three things: monitor the health and performance of their installed base, develop new detection models, and provide recommendations to customers. They need to be able to solve the problems of \u201cpatient zero\u201d \u2013 the initial customer who had the problem. Because all customers are sending telemetry, by using pattern recognition, they can act proactively, identify, and help customers who have the same risk profile before these problems impact their ITOps.\nSimple, common IT problems may occur 80% of the time and cause only 20% of the pain because IT knows how to deal with them. These issues are best served by good analytics and automation alone. The benefit of AI is to be able to identify and solve complex issues that may occur more rarely \u2013 say, 20% of the time \u2013 but cause 80% of the pain \u2013 without the benefit of an AI to quickly identify and remediate.\nTurnaround time is a valuable consideration. What used to take months to diagnose and fix on a large scale can now be done in days, or even hours. For example, if a customer had an issue in Germany based on a specific configuration, how long does it take for the organization to identify the issue? How long to confirm it\u2019s a unique problem, and reactively quantify and identify that issue in other environments? How long to apply fixes proactively, or make recommendations to those environments to mitigate that impact? Finally, and crucially, what is required to avoid the risk in the first place? Using broad-based telemetry AIOps provides a method to accelerate identification and improve recommendation accuracy.\nThinking global, acting local\nUsing telemetry in this way is a good example of thinking globally and acting locally. You can take all those experiences from customers, using their hardware and provider\u2019s services, and create a broader picture of what\u2019s going on. You can look at what customers are doing and what issues are happening, and then use the data to actually drive a number of these decisions. The provider can then prioritize the risk in its customer base and take targeted action.\nThe objective of the approach is to get out ahead of problems and give customers insights into potential issues existing within their environments combined with options to avoid these risks. If problems can be preemptively identified, customers can make informed decisions and control risk.\nMuch of the information unearthed through an AIOps process can help customers address the problems directly. Where issues can be avoided through usage, preemptive recommendations backed by factual reasoning provides IT with mandate required to drive change. If resolution requires product enhancements to address issues, these can be entered into the product lifecycle development to address the issue, or at a minimum enable better identification and prevention.\nWhat does tomorrow look like? \nMost of what we discussed here can be considered on a discrete system-by-system basis. You have a server, it sends its telemetry, and it sends back its recommendations. However, business success is no longer tied to monolithic systems. Interoperability between multiple systems, virtualization, applications, and users experience now define IT. To increase agility across the board, analytics need to happen not only at a discrete system level, but also at an IT-environment level. Right across the stack, telemetry is required not only to identify new threats, but also determine best practice. This is where AIOps is increasingly important as it can operate with the speed and scope that a team of engineers could never match.\nApplying AIOps to groups of machines and, by extension, groups of systems, and ultimately an entire customer base offers multiple points of perspective. Smart organizations can correlate this data and apply it to the whole concept of interoperability. Separately, moving up the stack and into the application, provides insights into how the application is actually engaging and interfacing with all of the products. This will enable new ways to optimize applications based not only how one customer is using it, but how entire customer bases are using technology globally.\nConclusion\nThe path to best practice will become better defined. Using fact-based analytics enabled at scale through AI will create an opportunity to build resilience into IT environments. As AIOps continues to mature, the scope of perspective will create reliable \u201cknown-good\u201d paths for vendors and customers alike. As for today, improvements in tools and data security now ensure that the benefit that AIOps can provide weighs heavily in favor of streaming machine telemetry data as many IT issues are becoming \u201coptional.\u201d \nService providers will abstract complexity from the customer and make better recommendations to increase predictability and ease of use. Development of successful AI-based solutions often rely on the collection of data. Service providers that have both access to telemetry data of a wide installed base of products, and the reach of a strong support services organization will have a significant advantage.\nCustomers can already benefit from being part of a large-scale connected community through predictive AIOps. AIOps has come a long way in five years. Expect it to continue to develop in the years to come.\nFor more information please visit www.hpe.com\/services\/operational\n____________________________________\nAbout Duncan Goode\n\nDuncan Goode is a worldwide services product manager for HPE Pointnext Services. His goal is to ensure a quality support experience that drives better business outcomes for customers. Duncan has worked in technology and support services for 30 years, providing leadership and innovation in a variety of roles across global support, mission-critical, and retail environments. Based in Australia, he enjoys spending time playing and coaching cricket.\n \nAbout Jordan Lewy\n\nJordan Lewy is a Worldwide Manager for HPE Pointnext Support Services. In this role, his goal is to transform HPE\u2019s customer support experience using HPE InfoSight, which in turn drives their business outcomes and enables their digital transformation journey. Jordan brings to his position a well-established background in information technology and professional services, where he has worked for over 20 years. Prior to taking on his current role he held other positions in HPE including leadership for HPE\u2019s Storage Support services, Installation and Technical services and HPE\u2019s Customer Technical Training business.