“Data, Data, everywhere, Nor any information to think”
-Paraphrasing Samuel Taylor Coleridge’s famous lines from the Rime of the Ancient Mariner.
Often at time it does feel like we are in a “paradox of plenty” kind of situation, somewhat akin to a resource curse where historic corporations with an abundance of data are finding themselves losing the race of market competitiveness to newer players who have much less data.
My initial thoughts were that till recently the focus of most corporations had been on mining their “historical” data.
However, with the world of today generating a steady and ever-growing stream of “real-time” or “near real-time” data, corporations need to wake up to the new reality that much of their historical data is not as relevant or valuable as they think it is.
In the absence of real-time data, historical data is often used as a proxy to make some predictions. But with real-time data being available now, that proxy is no longer needed or is no longer as relevant.
This has a big benefit – corporations that feel that they had fallen behind in the race to mine historical data do not necessarily need to play catch-up. They can make up for the lost opportunity by creating a framework to leverage real-time data streams.
Essentially, corporations can leapfrog and catch up with or even move beyond other players without getting caught up in what I’ll call the legacy data trap – ditch it, since most of it may not be as relevant as you think. Food for thought?
What does the Data Doc think?
I bounced this idea off Tom Redman, “the Data Doc.” He was skeptical. While, he agrees that companies need to wake up, he had two reasons for his skepticism.
First, real-time and historical data support different sorts of analyses and opportunities. He did not see one as a surrogate for the other.
Second, the biggest “gap” is the ability to analyze data and sort out what to do with those analyses. Real-time data does not address that gap.
Tom made some great points.
Till now most of the energy and resources of corporations were devoted to “historical” data, since the capabilities to harness real-time or near real-time data did not exist. Now suddenly there has been an explosion in both the volume of the real-time data as well as the tools to manage it.
As a result, there will be a shift of attention and resources from historical to real-time since both attention and resources are fixed and limited. Also, for many areas, an effective handle on real-time data is all that may be needed.
For example, we drive on the roads just using real-time data presented on the dashboard (speed, rpm, engine temperature) with no need of any historical data to meet the immediate need of going from point A to point B.
What do you think?
This could be an interesting survey question to ask CIOs and CDOs:
Of your total data management spend how much will you allocate to mining historical data vs. managing real-time data and why?
This may offer some interesting insights on how this entire area is evolving.
What implications does all this have on data strategy?
- Exact vs. Roughly Right: For historical data, the emphasis on getting all data in the right formats, with right definitions and in common data stores, needs to go. Such an approach has led to the mental and execution block that no meaningful insights are possible till considerable time and resources are spent on getting it all “right.”
- Consolidation vs. Federation: Approaches where data is pulled from various data sources into a single repository need to be replaced by approaches where data stays in its parent repositories but gets “pulled” as needed. A federated data application framework? IBM Watson Discovery Service does something like that but seems like it does it only for unstructured data. Fraxses seems to do it for both structured and unstructured data. With the kind of capabilities available now, physically moving data into a distinct data store (lake) may not be required. The lake may be virtual. This may be a quicker approach too.
- Internal vs. External: In most corporations, data strategies have been inward looking. That is, they have focused on internal data. In today’s world, any meaningful data strategy has to focus on internal as well as external data. How can you combine internally available data with publicly available or acquired external data to deliver business focused insights is a question the strategy needs to answer.
- Defense vs. Offense: Data strategy should enable support of both “exact” reporting (e.g., for finance and accounting purposes) as well as “directional” reporting (e.g., for strategy and business development purposes). Till now the focus has been on exact, which has meant all available data has not been effectively utilized. There is always a significant amount of data which is not “exact” but can still provide meaningful insights when weighted appropriately (e.g., Watson when playing Jeopardy did not come up with just one correct answer but several with appropriate weights). A recent Harvard Business Review article, “What’s your data strategy?” described it as defense vs. offense: Companies make considered trade-offs between defensive and offensive uses of data and between control and flexibility in its use. Leandro DalleMule and Thomas H. Davenport summed it up well in that article:
There is no avoiding the implications: Companies that have not yet built a data strategy and a strong data-management function need to catch up very fast or start planning for their exit.