Overcoming IT Operations Alarm Fatigue with ‘Pragmatic Observability’

it worker
Dell Technologies

In modern data centers, IT teams are bombarded with a constant stream of operational alarms. From alerts on server health issues to spikes in storage latency, from possible security threats to overloaded networks, these alarms beg for human attention — and overwhelm IT administrators.

This constant barrage of system alarms leads to a condition known as alarm fatigue or alert fatigue, in which busy admins tune out alarms. This is a serious issue for CIOs and other IT leaders, who know that missed alerts could lead to unchecked security breaches, service-quality problems and business impacts.

All of this points to the need for automated IT monitoring solutions that can sift through thousands or even millions of notifications and help IT management teams understand the issues they need to focus on and the root causes of those issues. These solutions, which combine artificial intelligence in IT operations (AIOps) and human intelligence, offer new levels of insight into IT services through the lens of business goals. This concept is known as “pragmatic observability.”

Pragmatic observability

In the Gartner view, pragmatic observability encompasses the full data stack, from infrastructure to applications. This view includes the digital experience, business-oriented key performance indicators (KPIs) and social sentiment, as well as the relationships and dependencies among those elements. Gartner notes, “To understand applications and provide insight into the status of the digital business, I&O leaders must use this pragmatic observability, leveraging AIOps to detect patterns and make connections.”[1]

Pragmatic observability narrows the alarm landscape to show the IT administrator what is most relevant to the issues at hand — the things that IT needs to focus on right now. It gives IT teams detailed views into system and component performance for all aspects of the complex automation in their infrastructure, as well as the elements to support established paradigms of trust in the automated environment.

With these needs in mind, CIOs should push for adoption of IT monitoring technology that supplies this transparency and trustworthiness data to protect their businesses. In particular, CIOs need to look for intelligent monitoring solutions that leverage machine learning techniques to help IT teams anticipate emerging problems and work proactively to turn predictive analytics into actionable insights. Working quietly in the background, these monitoring solutions narrow the IT operators’ view to the issues they need to concentrate on, while pulling back the curtain to unveil the root causes of performance impacts.

Intelligent monitoring in action

Let’s consider an example of an intelligent monitoring solutions in action. In the following Figure 1 screen capture, the intelligent monitoring solution sheds light on issues within a virtual machine (VM) stack that are impacting system performance and the user experience.

In the left-hand column, the screen shows key performance indicators for the VM and a timeline of configuration changes. Performance impacts are highlighted in pink on the Storage Latency chart, which are also anomalies as compared with historic seasonality for this VM. 

In the right side, the screen shows the end-to-end relationships from the VM to the storage objects, highlighting the related objects that are experiencing performance impacts.

hubbard   observability 1 Dell Technologies

Figure 1. What within the VM stack is experiencing performance impacts?

The following Figure 2 screen capture shows a drilldown view into the most recent performance impact, including details on related impacts and possible causes — three storage objects.

hubbard   observability 2.docx Dell Technologies

Figure 2. What could be causing the issue?

The screen capture in Figure 3 illustrates how pragmatic observability broadens the view. It shows other VMs in the downstream IT environment that may be impacted by the current storage latency problems.

hubbard   observability 3 Dell Technologies

Figure 3. What other VMs may be impacted?

The big picture

As these examples suggest, pragmatic observability goes beyond simple alarm consolidation. While consolidation is great, control is increasingly being ripped from the hands of IT teams through complex automation. Pragmatic observability puts the control back into the hands of the IT operators, while helping them understand the consequences of both upstream and downstream alerts.

This brings us to an important overarching point. Pragmatic observability recognizes that IT operators require both transparency for trust and abstraction for observability. When these are coupled with pragmatic views, CIOs can be assured that they are not missing downstream impacts of issues in their infrastructure.

Key takeaways

To manage a complex IT environment from edge to cloud to core, data center operators need to combine two trends — pragmatic observability for simplicity of use plus targeted transparency for trust. A fully featured intelligence monitoring solution delivers both of these benefits.

To Learn More

To gain a closer understanding of the features of a robust intelligent monitoring solution, explore the capabilities Dell EMC CloudIQ. And to see a series of examples of observability for performance impacts, watch the Dell Technologies CloudIQ Overview Video.

 

Explore AI and data analytics solutions from Dell Technologies and Intel.

 

JoAnne Hubbard is a Senior Principal User Experience Designer for Dell Technologies.

[1] Gartner, “Innovation Insight for Observability,” September 28, 2020.

Related:

Copyright © 2021 IDG Communications, Inc.