by Saurabh Sharma

Shedding light on dark data

May 27, 20155 mins

Dark Data refers to the data to which companies have access but are not using effectively. We explore the reasons Dark Data challenges persist and how companies can best address.

big data analytics
Credit: Thinkstock

In day-to-day life, there are numerous things all around us that have value but go unused. Consider that in 2014, $750 million in gift cards went unused, the American worker forfeited 169 million vacation days and an estimated $165 billion in food went to waste. By some estimates, 90% of the average human brain goes unused.

These measures offer a nice segue into a challenge that many organizations face today in their analytics efforts – uncovering, leveraging and making positive use of “dark data” – that is, data that resides in the organization but is not used effectively or, in some cases, at all.

Examples of dark data can include audio files of conversations with customers, complaint emails and chat messages, data captured in bank transactions, digital marketing campaign information, email click data, and so on. Additionally, an organization will often have data on the same individual from multiple relationships with the customer, but will lack the ability to map, cross-reference and arrive at a holistic and more complete picture of the customer.

How do organizations wind up in a dark data abyss?

Lack of awareness:

In many cases, the functioning members of the organization are simply not aware of the existence of this data. In the case of a bank, for example, an underwriting team may see information provided by a customer on an online credit card application form and collect it as valuable data. But they may not know that data on the customer’s journey to get to that point – how they ultimately arrived at the application page – is available as well. Either as a result of communication deficiencies or absence of training, dark data issues can arise when the availability of data is simply unknown.

Disconnect among teams:

Within larger organizations, data is often collected separately by different business units or teams. And once collected, that same data is often owned and managed by separate teams. There is usually no natural mechanism for data sharing between teams and it becomes an uphill task for one team to get hold of and understand data from another. A pool of data that may not have use for one team may be of great value to another, but the necessary sharing just does not happen.

Technology and tool constraints:

The manner in which data is collected and the disparate nature of the technology systems deployed within an organization can also lead to a dark data problem. Because collected data sits in separate silos, it is often difficult to systematically bring it together to produce a clear, cohesive picture. This is especially true for companies with legacy IT systems and where systems and IT formats are different (think audio files from call center interactions or click data from Web platforms). And it’s a common problem faced by those in the early stages of adopting a data analytics program.

Creativity gap:

Even if teams are aware of the availability of different types of data, have access to it, and have the tools to use it, they may still not end up using this incremental data just because it is new to them. Great Data Scientists are creative and have a hunger to learn – they are able to come up with new and interesting ways of using all data assets available to them. If this drive towards innovation is missing in a team, the organization might still have patches of dark data.

However ominous and spooky sounding the term “dark data” may be, there are steps organizations can take to extract this valuable data and put it to good, business-enhancing use. Here are some ways of achieving this:

A new function:

Establish a central data analytics function to be headed by a Chief Data Analytics Officer. It would be this person’s responsibility to obtain a complete view of the organization’s data assets and determine how best to share across teams and serve multiple functional areas. While executive teams may be leery of creating another cost center inside the company, the benefits to the business will far outweigh the costs in short order.

Audit of IT tools:

As explained above, an organization’s existing technology infrastructure may be the primary gating factor to data sharing and usage. Do your legacy systems prevent you from combining disparate data elements? Do you have speech analytics tools to use the information hidden in audio files? These are just a few sample questions. You need to determine whether your current systems are compatible for use with your analytics programs and aspirations. If not, make a business case for investment in IT tools by sizing up the opportunity through a pilot.

Codifying data models & dictionaries:

Determine if appropriate data model documents and data dictionaries are created and maintained in the organization. Also, check if everyone in the organization is speaking the same data language and if that language is easily transmittable and understandable across teams, with new team members and is not lost through employee turnover or attrition. Data descriptions and dictionaries need to reside in clear documents, not in people’s heads.

Extracting value from dark data can be complicated, but it is not an insurmountable task. For those organizations in the early stages of their analytics efforts, it does not need to be an immediate priority. But for those organizations that have already completed some of the basic steps of analytics and have begun initial level predictive modeling, the dark data challenge can be a useful next frontier that can help the company derive more value from its analytics program.