by Thor Olavsrud

Anomaly Detection Lets You Find Patterns in Log Data

Sep 10, 20134 mins
Big DataBusiness IntelligenceData Management

Generating insight from log data traditionally requires writing a search. But that means you need to know which questions to ask. How do you get insight from data you know nothing about? Sumo Logic's answer combines machine learning and pattern recognition to detect anomalous events in your data.

Organizations typically generate tremendous volumes of data from their infrastructure on a regular basis, much of it machine data in the form of logs. Turning those logs into insight is a difficult challenge and represents one of the more intriguing promises of big data analytics.

Traditional security and log-management tools attempt to provide insight into the chaos, but they typically require users to write rules to detect anomalies. Writing those rules requires pre-existing understanding of the data—you need to know what you’re looking for before you can perform a search to find it.

“CIOs don’t care about the logs. CIOs care about the events those logs represent. They care about anomalies.”

–Sanjay Sarathy, CMO of Sumo Logic

It’s Humanly Impossible to Know Everything About Your Data

“The first challenge is not just that there’s a vast amount of data, but the fact that typical analysis of machine data typically relies on search as the fundamental mechanism to investigate what’s going on,” says Sanjay Sarathy, chief marketing officer (CMO) of machine data analytics specialist Sumo Logic.

“The challenge with search is that you fundamentally need to know what you’re searching on. Given the explosion of data, it’s humanly impossible to know everything about your data,” says Sarathy.

“CIOs don’t care about the logs. CIOs care about the events those logs represent,” he adds. “They care about anomalies. The traditional way of getting to those anomalies and events is writing rules. But the challenge you have is actually to write those rules. Given the amount of data, it’s impossible to write rules for every event.”

Anomaly Detection Uses Machine Learning, Statistical Analysis to Detect Events

Sumo Logic’s answer is Anomaly Detection, a major architectural enhancement to its Log Management and Analytics service based on its LogReduce technology.

Anomaly Detection combines machine learning, statistical analysis and human knowledge from your domain experts to analyze streams of machine data, detect events in the stream and provide alerts on those events, allowing you to remediate issues before they affect business services.

“Basically, we reduce log lines into a set of patterns,” Sarathy says. “That allows us to figure out the root cause of issues. We don’t need to know anything about that data in advance to be able to come up with any of those patterns. You, as the domain expert, help us understand which patterns are relevant and which aren’t. We’re building on that pattern recognition technology to provide an automated way to do anomaly detection.”

“When you get an alert, you as a human go in and see what contributes to that anomaly,” he adds. “We don’t know how relevant that is to you as a human. You can

“When you get an alert, you as a human go in and see what contributes to that anomaly,” he adds. “We don’t know how relevant that is to you as a human. You can go in and characterize the severity of the event we’ve identified. You classify it and add annotations. If Sumo Logic sees that pattern again, we will immediately bring it up with all of the classifications and annotations you’ve given it. All of a sudden, you’ve gone from being reactive to being proactive and predictive. Let the data tell you what’s going on. Don’t ask the data all the time, because you’re limited by what you know about the data.”

Sumo Logic notes that the Anomaly Detection service gives customers the capability to do the following:

  • Identify imminent security threats
  • Detect anomalies across the entire application and operations infrastructure
  • Provide user feedback to turn anomalies into known events and classify events with the appropriate severity levels
  • Detect any future events that match the patterns associated with past anomalies
  • Visually identify and track anomalies, corresponding events and underlying log patterns through an Anomaly Dashboard
  • Use LogReduce to rapidly investigate and identify the root cause of these events
  • Set alerts for users whenever an important event appears
  • Scale anomaly detection to the scope of users’ IT infrastructure

Anomaly Detection is currently in beta release and expected to be generally available by November 2013.

Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for Follow Thor on Twitter @ThorOlavsrud. Follow everything from on Twitter @CIOonline, Facebook, Google + and LinkedIn. Email Thor at