Solving the Unstructured Data Challenge

BrandPost By Jaikumar Vijayan
Jun 25, 2015
Big Data

A lot has been made of the enormous value that businesses can derive from gathering and analyzing unstructured data from mobile devices, clickstreams, emails, web logs, social networks, call centers, sensors and other sources. But most data-driven initiatives remain firmly focused on traditional structured data rather than unstructured data.

In a 2015 IDG Enterprise study on big data and analytics, 83 percent of IT professionals said structured data initiatives were a high priority at their organizations, compared with 43 percent who viewed projects involving unstructured data as a top priority.

The results suggest that many organizations are missing out on what data experts agree is an opportunity to derive significant business value from properly harnessing unstructured data. IDC, estimates that unstructured content already accounts for a staggering 90 percent of all digital data, much of which is locked away across a variety of different data stores, in different locations and in varying formats.

Potential Benefits

Unstructured data can help companies gain a better understanding of their customers, products, services and business in general. For example, data from Twitter streams, social media networks and web logs can help a company gauge customer sentiment toward a product or service, or help identify and address a potential service or quality issue before it becomes a full-fledged problem. Combining existing data about customers from transactional systems with data gathered about them from other sources can help an organization get closer to a 360-degree view of its customers.

As IDC notes, companies that can figure out a way to properly collect, synthesize and use unstructured data can improve their bottom line, reduce costs and help organizations respond more quickly to changing market and customer sentiment.

“There are all kinds of ways we can take these different kinds of data, pull them together, and learn things about what’s effective much faster,” said Stuart Madnick, professor of information technologies at MIT’s Sloan School of Management (watch the video).

Making it Work

Getting there can be challenging, but it will be worth the effort. The first step is to have a strategy for integrating structured and unstructured data in a way that makes sense for your organization. Having a clear business use case is vital to any data-driven initiative, especially one involving the collection and management of unstructured data sets. IDC notes that organizations embarking on data-driven projects need to develop and promote a culture that not only understands but also embraces the collection, use and sharing of structured and unstructured content as a key asset.

You also need to decide what data to collect, analyze, and keep. Just because you can capture data from virtually any device or third-party resource doesn’t mean you should. Collecting terabytes of sensor data from manufacturing floors or systems in the field is of little use if your marketing team wants to track customer sentiment. So make sure you align the proper data sources with your business goals before embarking on an unstructured data initiative.

As with structured data, simply collecting unstructured data won’t get you the insights you need.  Even unstructured data has to be properly organized. Issues like data quality, data provenance and context are important. As IDC notes, text analytics, auto-taxonomy generation, auto categorization, auto-tagging and other formal information-handling techniques are vital to extracting additional value from unstructured data.

Numerous search, text analytics, visualizations and ETL tools are available to help companies mine value from unstructured data and integrate it with more traditional sources. Investing in such tools can be useful for organizations that are serious about taking advantage of unstructured data.