by Andy Berry

Shed light on your dark data before GDPR comes into force

Apr 17, 2018
Big DataLegalPrivacy

Somewhere, hidden in the depths of your business, lies a threat you may never have considered. Lurking in the shadows, it silently multiplies beyond your control. It threatens your business’ security and could cost you dearly under new regulations. The threat? Dark data.

geek with candle lost in binary background big data data scientist
Credit: Getty Images

Also referred to as unstructured data, dark data is growing at a rate of 62% per year, according to IDG. By 2022, they say, 93% of all data will be unstructured.  

Gartner defines dark data as, “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes”. Consisting of data from a huge variety of sources – emails, documents, instant messages, digital media posts, partly developed applications – or just information which isn’t being used or analyzed, its nomenclature makes it sound foreboding. With new regulations such as the GDPR coming into force, businesses must gain a clear understanding of the data they hold. For structured data, this is straightforward. But dark data is much harder to manage, stored across a distributed IT environment with no single owner.

A ‘bottomless lake of data’

Dark data tends to be text-based data, as well as video, audio files and images. It’s generated by a diversity of different sources, gathered from mobile devices, social platforms, apps and internal systems to name but a few.  Much of the data generated by the Industrial Internet and the Internet of Things is unstructured, so this also falls under the dark data shadow.

In the workplace, employees are responsible for generating a lot of dark data. In fact, says Sony Shetty from Gartner, “Across the enterprise, employees are blindly building a bottomless lake of data and, in many cases, a corporate mantra of ‘save everything, just in case’ is encouraging the behavior”. Think about the amount of data you, personally, generate, filter and store each working day – did you record your last conference call in case anyone missed it? Did you make it available as a podcast and save that, too? What about your customer calls – do you record them ‘for training purposes’ and store them as audio files? Do you have a chat function on your website and keep a record of the interactions, or use an instant message function on your desktop? One study found enterprises to be using almost 500 business applications, each generating data. 

All the data generated by this activity falls under the definition of dark data, and is stored across different devices, drives, desktops and SaaS platforms. Most of it will never see the light of day again. Employees leave – taking their passwords with them – customers move on, business priorities change, and no-one has the remit, the ability or the time, to remove the data.  The information quickly becomes out of date and inaccessible.

The need to understand data

Prior to the GDPR, dark data would have been an accepted part of legacy business. In the UK, the 1998 Data Protection Act didn’t provide any minimum or maximum period for data to be stored, so it would have been a case of ‘out of sight, out of mind’. Now, though, the GDPR requires businesses to gain an in-depth understanding of how data flows across their organization, along with stringent data governance. The new Data Protection Bill coming into force will implement the GDPR into UK law. From May 25th, if a ‘data subject’ – a client, employee or other stakeholder – asks what data a company holds on them, the company must know and share this. If they ask to see a record of when and how they gave their consent to be used, the company must provide this too, and only information necessary for its original purpose should be processed. “Inaccurate or outdated data should be deleted or amended and data controllers are required to take “every reasonable step” to comply with this principle”, says Debbie Heywood from Taylor Wessing.

This is extremely hard to fulfil if data is held in silos across an organization.  “Because unstructured data is text heavy and irregular, making sense of what is being said and how it’s being said — posi­tively or negatively — is not for the faint of heart,” says a report from the Medallia Institute.

Tapping into uncharted territory

The time has come for businesses to bring their dark data into the light. Doing so helps drive GDPR compliance, but the benefits of understanding dark data stretch far beyond compliance. Think of it as discovering uncharted territory: analyzing this unstructured data offers the opportunity to extract invaluable business insight which would otherwise lie dormant. It transforms information from data into strategic intelligence. Gartner cite, “Some examples of data that is often left dark include server log files that can give clues to website visitor behavior, customer call detail records that can indicate consumer sentiment and mobile geolocation data that can reveal traffic patterns to aid in business planning.”.

For example, most of us know that retailers are experts at using psychology to drive product placements. They understand our thought process and how we tend to move around a store, and place products accordingly. Studying filmed footage of consumers’ mobility in stores helps retailers refine their product placement strategies even further. As Deloitte says, “A retailer may be able to gain a more nuanced understanding of customer mood or intent by analyzing video images of shoppers’ posture, facial expressions, or gestures”.  This intelligence, extracted by analyzing dark data, can translate directly into revenue as retailers apply it to their store layout.

By analyzing dark data businesses can:

  • Create a truly 360-degree single customer view, to drive engagement and boost interactions
  • Anticipate, understand and respond to changes in market- and consumer-demand
  • Develop an in-depth understanding of consumer sentiment on their brands, gleaned from social platforms and multichannel interactions
  • Lockdown and secure vulnerable data points, and give personal data the protection it requires
  • Refine the accuracy of risk management models
  • Address recurring pain points for customers and direct customer support to those areas most affected
  • Identify any links and connections between data sets
  • Generate a strong foundation for accurate forecasting
  • Gain a deeper understanding of website performance from web analytics
  • Identify new revenue streams. According to IDC, “By the end of this year, according to IDC, “50% of Large Enterprises Will Be Generating Data-as-a-Service (Daas) Revenue from the Sale of Raw Data, Derived Metrics, Insights, and Recommendations”.

Now, analyzing unstructured dark data is simpler than ever before. Advanced, high-performance Customer Information Management tools automate and accelerate processes, connecting data sets for clarity and insight. Software scans both structured and unstructured data, using different data profiling techniques. The results of the scan are used to automatically generate a library of documentation, which describes a company’s assets and creates a metadata repository. You can then start to explore the opportunities and possibilities which lie within the data – and that’s when it starts to get really exciting.