Shining a light on dark data: Securing information across the enterprise

How do you address the risks of breach and disclosure associated with redundant, obsolete or trivial data?

data breach thinkstock
Credit: Thinkstock

There are countless distractions that keep CIOs awake at night. Some of them – such as uptime, new technologies and budget concerns – are nothing new. Others – such as Big Data and an increased emphasis on Data Security – are newer concerns. At the intersection of Big Data and Data Security is where a new danger lurks. As the cost per gigabyte of enterprise storage continues to decrease, it has become cheaper and easier for enterprises to simply add more disk space instead of policing their existing usage; and in the process they risk losing control of their data. While most businesses know where their important data is located, they don’t always know where it isn’t. This lack of control has introduced a new risk into the enterprise, Dark Data.

Traditionally, the risk profiling of primary data sources has focused on reducing the expense of discovery during litigation or a data breach. Because of this, significant effort is put into protecting these primary data stores. However, “Redundant, Obsolete or Trivial” data (often called ROT or simply Dark Data), presents new challenges to data security and governance. Redundant data can occur when copies of data exist across multiple locations. These copies aren’t always kept up to date and become obsolete from a business perspective, yet still contain significant risk.

What happens when this data isn’t deleted, or migrates to other servers or data stores that don’t have the same level of protection? For instance, what risks exist when a database containing protected health information or personally identifiable information is exported from a production SQL server onto an unsanctioned temporary server with minimal security controls? An attacker that can find such a poorly defended server will compromise it and reap the same rewards as if they had compromised the primary location of that data. This data presents the very same risks of breach and disclosure as the primary data sources, and attackers have greater incentive to find them than the information owners.

When attackers infiltrate your network, they quickly establish a foothold and backdoor to ensure they can return undetected. Then, the attackers start mapping out your network, searching for systems they can breach and data they can monetize themselves or that they can sell to other criminals. Any attacker worth his salt is going to be brutally efficient at this, and will likely end up with a unique perspective of what your network looks like. If the attackers are criminals, their focus is purely monetarily driven; what they can steal is what they can sell. If they are hacktivists with an axe to grind, they are going after anything they can use to tarnish your company’s reputation. Either way, they waste no time to locate and exfiltrate this data.

So, how do you deal with the risks associated with Dark Data? Fight D’s with more D’s! Defensible Data Deletion, the process of identifying ROT data across the enterprise, is specifically designed to reduce the risks associated with this type of data. While this solution was created to reduce operational cost and eDiscovery risks associated with litigation, it works equally well to help defend your organization from the risks of breach and disclosure. First, enterprise data stores are identified and scanned for various data files such as documents, spreadsheets, etc., and a database of the location of these files is compiled. Metadata about these files is also stored, in addition to capturing a digital fingerprint of each file using an MD5 signature or similar hash algorithms. This helps to identify files that are duplicates with different filenames or other metadata. Once this is completed, redundant copies which serve no business purpose, can be deleted and removed from the environment. Additional analysis can be performed to show that files are older outdated versions, or smaller subsets of existing data, and they can be removed as well if they serve no reasonable business purpose.

Failing to know where your organization’s data reside doesn’t absolve you of protecting that data. Since the risks associated with ROT data are at least on par with that of normal production data, it is extremely important to an organization to not only identify, but also to curate, prune and dispose of this data. Due to its high likelihood of being lost, forgotten and detached from its original owner, Dark Data can eventually become a cancer in your system requiring heroic efforts to eradicate.

This article is published as part of the IDG Contributor Network. Want to Join?

NEW! Download the State of the CIO 2017 report