Artificial intelligence was once a magical concept, the stuff of science fiction. Now, after decades of research and commercialization, it\u2019s just another foundational tool to keep the enterprise stack running.\nNowhere is this more evident than in the world of DevOps, a data-rich, back-office practice that presents a perfect sandbox for exploring the power of artificial intelligence. The teams in charge of operations now have a burgeoning collection of labor-saving and efficiency-boosting tools and platforms on offer under the acronym AIOps, all of which promise to apply the best artificial intelligence algorithms to the work of maintaining IT infrastructure.\n\n[ Cut through the hype with our practical guide to machine learning in business and find out the 10 signs you\u2019re ready for AI \u2014 but might not succeed. | Get the latest insights with our CIO Daily newsletter. ]\n\nAIOps is among the better use cases for artificial intelligence. Servers and networks generate petabytes upon petabytes of data. We know when processes start and stop, surge and ebb, often down to the millisecond. RAM and CPU demands are often well-understood and so are the prices for renting hardware in the cloud. All are often calculated down to six or seven significant digits. Creating an autonomous car may mean struggling with a world filled with pedestrians, livestock, and shadows, but when it comes to IT infrastructure, everything is already digitized and ready for analysis.\nSome of the simplest tasks for AIOps involve speeding up the way software is deployed to cloud instances. All the work that DevOps teams do can be enhanced with smarter automation capable of watching loads, predicting demand, and even starting up new instances when the hordes descend.\nGood AIOps tools generate forward-looking guesses about machine load and then watch to see if anything deviates from these estimates. Anomalies might be turned into alerts that generate emails, Slack posts, or, if the deviation is large enough, pager messages. A good part of the AIOps stack is devoted to managing alerts and ensuring that only the most significant problems turn into something that interrupts a meeting or a good night\u2019s sleep.\nThese methods for watching for unusual levels or activity are sometimes deployed to bolster security, a more challenging task, making some AIOps tools the purview of both security watchdogs and the DevOps team.\nSophisticated AIOps tools also offer \u201croot cause analysis,\u201d which creates flowcharts to track how problems can ripple through the various machines in a modern enterprise application. A database that\u2019s overloaded will slow down an API gateway that, in turn, freezes a web service. These automated catalogs of the workflow can often help teams spot the real problem faster by documenting and tracking the chains of troublemaking.\u00a0\nMany of the tools in this survey are built on monitoring systems with a long history. They began as tools that tracked events in complex enterprise stacks and have now been extended with artificial intelligence. A few of the tools began in AI labs and grew outwards. In either case, anyone evaluating these platforms will want to look at the range of connectors that gather data. Some AIOps platforms will better integrate with your stack than others. All offer a basic set of pathways to collect raw data, but some connectors are better than others. Anyone considering adopting an AIOps platform will want to evaluate how well each AIOps offerig integrates with your particular databases and services.\nHere are 10 of the leading AIOps tools simplifying the job of keeping enterprise IT infrastructure humming.\nAppDynamics\nAppDynamics is a division of Cisco that specializes in performance monitoring. It has added machine learning to its flagship platform to watch for metrics that diverge from the historical baseline. The system can build a flowchart and learn how events can cascade until system failure, thereby helping identify root causes. AppDynamics pushes correlating these metrics with hard \u201cbusiness outcomes\u201d such as sales numbers and a \u201cself-healing mentality\u201d for its platform by providing links that can automate the resolution of common failures.\nBigPanda\nBigPanda focuses on both detecting strange behavior and orchestrating the teams assigned to solve it. Its eponymous platform offers root cause analysis and event detection that integrates with the major cloud providers. Its \u201cLevel-0 Automation\u201d handles the workload that comes after a problem appears. BigPanda simplifies the workflow by creating tickets, sending out alerts, and even starting up virtual \u201cwar rooms\u201d for serious issues.\nDatadog\nDatadog recently added the Watchdog module to its performance management tool so DevOps teams can ask for automated warnings when performance begins to fail. The tool builds performance forecasts based on historical records adjusted for season and time of day. Changes in metrics such as latency, RAM consumption, or network bandwidth can trigger alerts if they depart from norms. The tool is integrated with Datadog\u2019s security detection system, and it can work with virtual machines, cloud instances, and also serverless functions.\nDynatrace\nDynatrace is a broad, full-featured monitoring tool for tracking cloud-based VMs, containers, and other serverless solutions. It sucks up log files, event reports, and other triggers to deliver what it calls \u201cprecise, AI-powered answers.\u201d The core is called Davis, a deterministic AI that constructs flowcharts and trees so that it can pinpoint the root cause of any anomaly or failure. If it\u2019s properly configured, it can run autonomously by triggering changes that should fix the cause. It could be as simple as rebooting an instance, but it might happen without waiting for a human to get in the loop.\nGithub Copilot\nMost AIOps tools are designed to help software that\u2019s already up and running. Github Copilot starts earlier in the process, helping when code is first being written. The tool watches what a programmer types and makes suggestions for how to complete it. It was trained on a gazillion lines of open source code so these ideas are grounded in some form of reality. There are still questions that are somewhat philosophical about who is the ultimate author of the new code, whether the AI can be trusted, and whether the millions of open source coders out there deserve some kind of credit or hat tip for assistance. The answer may be \u201cperhaps.\u201d A bigger question is how much better does Copilot understand your code and does it really do much better than autocomplete. That answer is that it probably varies.\nIBM Watson Cloud Pak for AIOps\nIBM created the \u201cWatson Cloud Pak for AIOps\u201c by integrating its general Watson brand AI with its larger cloud presence. The tool brings automated root cause analysis to the data collected from the cloud monitoring software. When the events reach a configurable level of severity, they can trigger either basic alerts or more automated responses from the toolchain. IBM has integrated the results with its other Cloud Paks for providing Network, Business, and some Robotic Process Automation.\nLogicMonitor\nLogicMonitor calls its AI \u201cLM Intelligence.\u201d It bundles a root cause detector with an alert system based on dynamic thresholds adjusted from historical data. Its early warning system depends on a forecasting module that\u2019s extends this historical data to compute thresholds on latency, bandwidth, and other metrics. LogicMonitor prioritizes reducing \u201calert fatigue\u201d to help teams focus their efforts on truly anomalous behavior. The data collectors tap into the major clouds and watch compute resources (Kubernetes, containers, etc.), network traffic, and storage systems (databases, buckets, etc.).\nMoogsoft\nMoogsoft is a specialized AI engine that integrates with major performance monitoring tools such as New Relic, Datadog, AWS Cloudwatch, and AppDynamics. If your stack is running something different, such as open source or in-house solutions, Moogsoft professes the desire to integrate with \u201canything, anywhere and anytime.\u201d The product moves the data through a pipeline that de-duplicates events, enriches them with contextual data from other sources, and then correlates the data before raising an alarm. The clustering algorithms and historical records help reduce the noise and produce more useful reports of problems.\nNew Relic One\nNew Relic added an AI engine to its performance monitoring tool One and it tracks all events ingested, including those from other tools such as Splunk, Grafana, and AWS\u2019s CloudWatch. The tool can be configured with flexible levels of sensitivity for a variety of events of potential severity. You can tell New Relic that, for instance, a low-priority error should raise an alarm only if it occurs several times over fifteen minutes. But a high-priority event like a crashed server will generate a pager alert immediately. The issue log tracks all events and includes a Correlation Decision report that lays out the logical steps taken by the AI en route to raising an alarm.\nSplunk\nSplunk began as a tool for gathering log files and building a comprehensive reporting tool for tracking performance, identifying anomalies, and helping the team diagnose problems. The product integrates informational graphics with a deep indexing tool to catalog the events. Artificial intelligence and machine learning algorithms within Splunk can anticipate problems and understand their source. These algorithms track all of the services integrated with Splunk to find the root causes. The machine learning features are deeply integrated with the platform so that service engineers skilled at tracking performance can leverage the best machine learning without much additional training. They can track the historical performance and watch for divergence through the main dashboard.