Fighting emerging threats with data analytics

squares of light
Dell EMC

Fighting emerging threats with data analytics

By John Nicholas and James Birmingham

When it comes to today’s rapidly evolving cybersecurity landscape, you’re likely to hear a great deal of talk about intelligence-driven security solutions. The idea is that the security solution delivers insights and deploys countermeasures based on data analytics and applied intelligence.

While people sometimes characterize intelligence-driven security as a next-generation approach, and although it continues to evolve and mature, it is actually not new at all. The approach to identifying and countering threats has been intelligence-driven even before the inception of SecureWorks in 1999. Intelligence-driven security uses advanced data analytics capabilities to help organizations identify cyber threats as they occur, and to protect their businesses in near real-time.

It takes a platform with ingrained analytics capabilities to find the threats in enormous amounts of data. For example, SecureWorks processes more than 230 billion cyber-events in the course of our day to help protect 4,300 clients in 58 countries. Using data analytics, the capabilities delivered in our Counter Threat Platform provide organizations with an early warning system for ever-evolving cyber threats, to help prevent, detect, predict and rapidly respond to cyberattacks.

Some large enterprises build on the capabilities of early warning systems with the addition of automated remediation of threats. This is very much the future for the fight against cyber-crime. With automation and analytics, organizations can not only detect but also resolve problems, greatly reducing threat impact. For example, when the early warning system detects a malicious attachment, the automated remediation solution can automatically delete all unread copies of the email and quarantine any machine where the attachment was opened.

Data analytics plays into virtually everything we do. The amount of data we collect is growing at an astounding annual rate of 66 percent, and we have processed as many at 300 billion cyber-events in a single day. The only way to gain insights from these massive amounts of data is to use powerful analytics tools in conjunction with a highly scalable data environment like Apache™ Hadoop®.

With that understanding in mind, and when the platform was still rather young, we adopted the Hadoop platform for processing these very large datasets. Knowing we had to stay ahead of ongoing 66 percent growth, early on we also chose to adopt the power of complementary technologies like Apache Spark™, a fast engine for large-scale data processing in real-time, and Apache Impala, a distributed SQL query engine for data stored in Hadoop. These tools, which we use in conjunction with homegrown analytics software called Very Large Database (VLDB), help us extract insights from data in near real-time, as millions of events stream in to our data centers.

Our ever-increasing data volumes caused us to outgrow our traditional data warehouse solution. We chose Apache Impala as the new solution and are in the process of moving all of our data warehouse queries over to it. Impala provides the performance we require at any data volume and operates at a fraction of the cost of our traditional data warehouse solution. Impala is also easy to use, since it provides a familiar SQL interface, and is used every day by our researchers to perform forensic analysis and gather intelligence.

Today we use Apache Spark to gain insight into what systems exist within our client’s environments and how those systems are configured. We passively extract this information from the billions of client logs we receive each day. The information extracted includes IP addresses, hostname, MAC address, operating system, and installed applications. Our technology processes more than 500 samples per second and is responsible for keeping our database of over 50 million assets current. The asset database is used to make decisions about when to escalate and when not to escalate, inform the severity of a threat, and provide context to our analyst and our clients.

For example, if we detect an attempt to exploit a Windows vulnerability, and we know the asset is running Linux, we know the attempt will not be successful and can avoid escalating the incident. However, if we know that the asset has the specific vulnerability that is being targeted, we know the situation is critical and can respond accordingly.

SecureWorks’ focus on streaming data has the built-in benefit of decreasing the time to action in response to threat incidents. By taking action in close to real time, our notification capabilities drive client playbooks with greater speed and accuracy. Over time, as the threat landscape has matured, it has become more important to account for data over a greater time period in our correlation. As such, SecureWorks developed the Long Term Correlation Engine (LTCE), which allows for streaming threat event data to be correlated with data points across the last 24 hours, without any time delays in delivering incidents to our clients. 

As the technology matures, we will see a target of 192 hours (eight days) and improvements that migrate the tool from simply correlating contextual data into the informative incident notification, to actually using Kill-Chain analysis to drive new detections on otherwise benevolent log data. Bringing massive compute and storage capabilities together to help protect our clients, powering these “sense making” activities with true machine learning (we call Foresee), and delivering all of this to security teams in record times, allow SecureWorks to stay ahead of threats.

To look at the bigger picture, there’s no rest in the business of cybersecurity. Cyber-threats are evolving rapidly, and cybersecurity must evolve at an even faster rate. To defend an organization’s networks, computers and data from unauthorized digital access, attack or damage, cybersecurity companies must continually come up with new approaches and new technologies.

That’s the reality of life in a world in which attackers are becoming ever more sophisticated and the threat actors themselves are changing. For example, some attacks are now launched without malicious software, which historically has been the tool of choice for hackers and therefore primarily what security systems are designed to look for. The attackers have learned to use stolen credentials to gain remote access to systems or may even be insiders who have access to systems and the knowledge to wage their attacks without drawing the attention of systems that detect abnormal or malicious behavior. In this new world, we must make greater use of data analytics and machine learning techniques that help systems learn on their own, so they can recognize attackers from wherever they find a way into systems.

In cybersecurity, the name of the game is to always stay a few steps ahead of the attackers. As attackers become more sophisticated, we all have to become even smarter in our approaches to countering threats. Data analytics helps us greatly in this effort.

For a closer look at the power of intelligence-driven security solutions, including lessons learned from Advanced Persistent Threat (APT) victims, visit secureworks.com.

John Nichols is the Director of Software Engineering at SecureWorks. James Birmingham is VP of Engineering at SecureWorks

Copyright © 2017 IDG Communications, Inc.