by Sanjay Srivastava

Three ways to make sense out of dark data

Opinion
Jul 25, 2017
AnalyticsArtificial IntelligenceData Management

To address the challenge of dark data, use artificial intelligence to unlock unstructured data, deploy modular and interoperable digital technologies, and build traceability into core design principles.

As the parent of a sixth grader, I am inundated with emails detailing upcoming school activities and tons of related information. For instance, if I want to organize an event for my son’s birthday, in order to plan well, I have to sort through emails, text messages, school calendars, last year’s invitations, a Google list of local caterers, and commitments on my work calendar. And part of the problem is that data is strewn across different places, in different formats, and is generally difficult to automatically correlate and use in an automated fashion.

Enterprises face a similar challenge. Most organizations sit on a mountain of “dark” data – information in emails and texts, in contracts and invoices, and in PDFs and Word documents – which is hard to automatically access and use for descriptive, diagnostic, predictive, or prescriptive automations. It is estimated that some 80 percent of enterprise data is dark. There are three ways companies can address this challenge: use artificial intelligence (AI) to unlock unstructured data, deploy modular and interoperable digital technologies, and build traceability into core design principles.

Data extraction powered by AI technologies

The first key to success is transforming this dark data into structured data, like that in a database or spreadsheet. This data extraction and classification method uses natural language processing, ontology detection capabilities, and other AI techniques to “light up” data, transforming unstructured data into structured data. When information is structured, enterprises can make faster decisions, determine smarter insights, and drive better business outcomes.

Modularity and interoperability put AI to work

Data powers artificial intelligence, which discovers insights and uses intelligent automation (IA) to action those insights. But to get the most benefits on this IA to AI spectrum, data analytics, AI, and a set of core digital technologies need to work seamlessly together and easily interconnect with the enterprise infrastructure.

This interoperability is the second key to success, as it allows companies to easily add data analytics and artificial intelligence to the technology foundation they have previously built. Interoperability is critical for enterprises to harness all the data that already exists. 

Traceability drives governance, key to enterprise applicability

In a business environment and in mission-critical situations, artificial intelligence is really only useful when decisions can be traced back to the underlying drivers, and this traceability is critical to ensuring effective governance. The human workforce, through its evolution over centuries, has many governance mechanisms already built in – for instance, it is easy to see if a hundred colleagues do not show up for work on a given day. 

In the world of automated software robots, however, it may not be as simple to detect when robots do not show up for work. If someone changes an application’s password, it may be days before we can spot that the data for an entire continent is not getting processed by the hundred robots working on it. An integrated command and control center than can effectively manage errant robots or biased machine learning algorithms, and allow for traceability in AI decisions, is the third catalyst to light up dark data.

Lighting up data transforms business outcomes

To understand the potential when enterprises light up dark data, as an example, take a life science company’s pharmacovigilance operations that oversee adverse event monitoring. The Food and Drug Administration (FDA) defines an adverse event as any undesirable experience associated with the use of a medical product in a patient, and the FDA has related regulations to ensure companies develop drugs safely. Every time patients complains of an adverse event, their doctor must report it; however, that information is embedded in doctors’ notes, voice mails, or emails, and often has to be interpreted with deep contextual knowledge of medicine. Therefore, pharmacovigilance is a challenging, complex, and resource-intensive affair – and at once, a core and critical, life-saving activity.

Digital technologies such as computer vision, computational linguistics, feature engineering, text classification, machine learning, and predictive modeling can help automate this process.  Working together, these digital technologies enable pharmaceutical and life sciences companies to move from simply tracking issues to predicting and solving potential problems with less human error. Interoperable digital technologies with a reliable built-in governance model drive higher drug quality, better patient outcomes, and easier regulatory compliance.

The opportunity ahead for leaders in AI deployment

Forward-thinking companies already bank on artificial intelligence and other digital technologies to solve business problems and transform customer value. But those that get it right have three things in common: They use AI to unlock unstructured data, have modular and interoperable digital technologies, and build traceability into their core design principles.