Many organizations run data science teams as separate silos of activity. These teams focus on gathering, cleaning and querying unstructured or “big” data, but they rarely touch data from transaction processing systems and corporate business processes, and might not even be members of the IT group. These “siloed” data scientists and analysts in analytics labs could soon be a thing of the past thanks to digital transformation.
Companies are digitizing virtually everything—from digital renderings of closets full of paper-based documents and photos to videos, CAD documents, social media feeds and voice recordings—and creating vast troves of unexploited and unstructured data.
As organizations invest in converting and storing all of this data in digital formats, they also expect returns from the investment. Minimally, they want to plumb this data for information and insights that can help their businesses.
Let’s say that you’re looking at the buying patterns of major customer A. You might take a look at the CRM system records of how many times your salespeople have contacted customer A and what the results were. Your marketing department might want to compare when customer A made purchases with the timing of product campaigns that the company promoted on social media. If there is an interruption in customer A’s buying pattern, your sales and customer service departments might also want to look at sentiment analytics from the customer’s last call about a product warranty or service issue.
The takeaway for CIOs and IT leaders is clear: unstructured data from sources like newly digitized voice recordings and social media content has to be used together with transactional data from systems like CRM if you’re going to get a full picture of a particular customer’s situation that you can act on.
The twin challenges of systems integration and data sharing between disparate systems have forever been on the plates of CIOs. But with digital transformation in full force, there are now new market pressures to perform these integrations faster and with greater accuracy.
Following are the 4 minimum requirements that will enable the kind of full customer picture that digital transformation demands.
System integration and data exchange
System integration and the assocated cost, time and complexity continues to challenge companies. This process has only grown more complex with digitization and the adoption of hybrid IT architectures that now require IT to integrate different cloud platforms with its internal data center systems. Unstructured data from the web and from other data sources like CAD systems are not in a fixed record format, which adds to the complexity, because now there are many more types and sources of data that must be integrated into the mix. In addition, not all of this incoming digital data is easy to access or to integrate. IoT equipment providers, for example, can employ unique and proprietary communications protocols that make it difficult to exchange data streams and files.
Companies will never get on top of the data piling up from digitization if they expect IT to manually perform data integration. While there is a place for doing some systems integration “by hand,” there are also tools in the market that can work with many different system interfaces, and even interface with the unusual communications protocols that are found in IoT.
The next step is finding a tool that meets your particular needs.“We needed an internal workflow system that could work with backend IT systems,” the head of IT planning at a medical equipment manufacturer told CIO. “The first tool we used did part of the integration, but we still needed to process XML, and doing the XML programming is a demanding task. We decided to research more tools and found one that not only supports XML file mapping but also email and HTML output. The tool also had a user interface that allowed us to use it in a wide range of applications. The software enabled us to integrate our manufacturing workflow with our quality assurance team, and improved performance.”
Central data repositories
To present a full customer picture, unstructured digital content and the system of record data must be brought together in either a data warehouse or a larger data repository that the analytics can operate on.
For example, when a wholesaler wanted to better understand and serve its customers, making sure that data from diverse systems, including a cloud-based CRM system, an ERP system, and web services passing unstructured data could work together and contribute data to a central data warehouse was key to that objective. “In order to maintain all systems keeping up with the pace of business we needed to minimize the impact on other systems due to functional changes by loosely coupled systems. So, we decided to eliminate the peer-to-peer systems structure and integrate data and systems into a centralized structure using an EAI/ETL (enterprise application integration/extract transform load tool),” said the senior manager of the manufacturer’s Information Strategy Office.
There are two takeaways for IT managers from this approach. The first is to seek out software-driven automation to extract, transform and transfer unstructured data accumulated from digitization. The second is to revisit IT architecture and job flows to determine the most efficient way to transport data from different storage locations before it finds its final resting place in a data warehouse or central data repository that users access for analytics.
Poor data quality costs companies dearly. Katie Horvath, CEO of Naveego, a provider of data accuracy solutions, referenced a recent IBM survey of Fortune 1000 companies. The survey concluded that it took $10/record to fix data, but cost organizations up to $100/record for missed or errant decision making that was based on poor data. “Organizations don’t understand how big the data health problem is,” said Horvath.
Too many organizations ingest huge volumes of data without cleaning it, and then get garbage out from the garbage they put in (i.e., GIGO). The proliferation of new types of unstructured data adds fuel to the fire. This quandary can be avoided by developing a data quality plan and methodology.
Data retention should be addressed or revisited, not only for transactional data but for the volumes of unstructured data that come from the internet, the IoT, and other data sources. Which data stays—and which should be jettisoned or moved to cold storage because it is never or seldom accessed—should be addressed in corporate information policies.
The remaining data should be cleaned and error-corrected as soon as it comes in. Tools that can be used to identify and fix broken, incomplete or inconsistent data are often packaged in ETL (extract-transform-load) software that you can buy off the shelf.
“Data cleaning is a major focus for many companies,” said Horvath. “In the oil and gas industry, we see companies cleaning their data on their wells so the data can be normalized for use in a central database that is used for decision making. What they want to do is to maintain data quality and achieve a single ‘golden record’ of data that appears consistently across their organization—and eliminate the cost of erroneous decision making that was based on poor data.”
Collaborative data science and IT teams
A Gartner survey conducted in late 2017 reported that “half of CDOs (chief data officers) now report directly to a top business leader such as the CEO, COO, CFO, president/owner or board/shareholders. By 2021, the office of the CDO will be seen as a mission-critical function comparable to IT, business operations, HR and finance in 75 percent of large enterprises.”
This is good news as companies march onward in their data digitization efforts.
However, it’s not great news if data science and IT functions operate in separate silos— because a majority of unstructured “big data” will need to be navigated by the same constructs that manage transactional data. Because of this, companies risk losing out on their “insight return” from data unless they are able to perform analytics on a mix of transactional and non-transactional data coming in from a diversity of systems and sources. For this to happen, corporate IT (in charge of transactional data) and data science (in charge of unstructured digitized data) must work together.
Nick Elprin, CEO and Co-Founder ofdata science platform providerDomino Data Lab, explained this need for collaboration in an article on KDnuggets.
A major insurance company had dozens of scientists working in uncoordinated ways on the same business problems — leading to lost investment and missed opportunities,” said Elprin. “There’s a difference between having a collection of individuals who create models, and having a dynamic team capable of leveraging its collective knowledge, skills and past work to collaboratively build better and better models with faster time to value.
The takeaway for CIOs and IT decision makers is that data science and IT groups, along with end users, have to work closely together to get the best out of all of data. “Having data scientists all on a separate team makes it nearly impossible for their work to be appropriately integrated with the rest of the company,” wrote Rachel Thomas, founder of fast.ai, an artificial intelligence firm. “Vertical product teams need to know what is possible and how to best utilize data science.”