by Marc Wilczek

From big data to good data: closing the gap between data governance and business insights

Opinion
Aug 08, 2017
AnalyticsCIODigital Transformation

Despite increasing spend on big data technology, many organizations still struggle how to make sense out of the massively growing digital universe.

statistics stats big data analytics
Credit: Thinkstock

With the expansion of the digital gold rush, data is moving into the spotlight and becoming a valuable source of information. Estimates are that the digital universe will continue to double every two years at least and reach 44 zettabytes by 2020, 50-fold growth compared to 2010. The sheer size of the data lake is staggering, but the million-dollar question remains; that is, how to make sense out of the data tsunami and capitalize on it.

Storage costs keep plummeting

The phenomenon referred to as Moore’s law has been observed for decades, and with the emergence of new technology (SSD, SW-defined storage, object storage, etc.) as well as the consolidation within the storage industry, the price spirals keep heading south with a double-digit decline year-on-year.

In the digital age, the real cost of data storage is no longer in purchasing hardware but in the effort and knowledge required to diligently manage digital assets. Across many industries and geographies this is made even more challenging due to increasingly restrictive requirements for data life-cycle management, country-specific privacy laws and bolstered compliance regulations for data-retention periods, greater utilization of encryption technology, and so on.

Despite increasing spend on big data, technology companies are still groping in the dark

According to IDC’s research, the market for big data analytics will soar from $130 billion in 2016 to more than $203 billion in 2020, equaling a compound annual growth rate (CAGR) of 11.7 percent.

Although enterprises spend a fortune on collecting, storing and managing data, only a few excel in converting raw data into actionable information. This is particularly true as far as unstructured content is concerned, which still accounts for approximately 80 to 90 percent of all corporate data.

A recent report by Veritas concluded that 52 percent of all data currently stored and processed by enterprises around the globe is considered “dark,” whose value is unknown. As much as 33 percent of the data is considered redundant, obsolete or trivial, and is even known to be useless. On average, only 15 percent of all stored data is considered to be business critical. Unless enterprises take corrective action and become more considerate, estimates are that a “data hoarding” culture will cumulatively lead to $3.3 trillion in avoidable costs by 2020, for managing the digital cemetery.

Cloudification: friend or foe?

In light of the fierce price battles, especially in the public cloud domain, and the ability for enterprises to store shiploads of data at low costs per unit, many of them feel strongly tempted to take advantage of it and swiftly transition corporate data into the cloud. While there are plenty of legitimate reasons for doing so and use cases in abundance, the decision to touch petabytes of data should be taken thoroughly.

First of all, enterprises need to properly understand their data’s composition in terms of content type, age, relevance and so on, and classify it accordingly. For instance, offloading dark data into the cloud is nothing else but a waste of time and money. Moreover, data will unfold gravity upon transitioning it, and this just moves the problem further away. As the digital universe expands exponentially, moving a growing data estate back will become at least a challenging undertaking, if not a real nightmare. Thus, enterprises should carefully assess, visualize and classify their data prior to embarking on a cloud journey.

Data governance

Dealing with structured data may not a big deal, but governing unstructured data is a much greater challenge than it might initially appear. With the lion’s share of all data being unstructured, assessing the value of it, and identifying duplicative, confidential and sensitive information are key components when implementing datacentric business models.

Whether ownership sits with the Chief Information Officer (CIO) or a dedicated Chief Data Officer (CDO), the data and analytics leader should work closely with their business unit peers and come up with a data governance framework that builds the foundation for all use cases. This typically includes how data is being classified, captured, refined, analyzed, managed, monetized, retained and erased — taking into account compliance and other regulatory requirements that may apply.

Enterprises that have developed proprietary algorithms that enable them to derive business value are well-advised to consider filing a patent to safeguard their intellectual property rights.

Takeaways

Despite increasing spend, a lot of groundwork must still be done. Enterprises should avoid getting trapped in an opportunistic “data hoarding” culture and be aware of the existence of a tipping point, at which creating even greater data silos won’t necessarily lead to bigger returns — especially when keeping in mind how much of the content is “dark” or “ancient.” As a matter of fact, the outcome of any big data analytics project is only as good as the quality of the data being utilized. To a great extent, this has to do with a well-implemented governance model that sets apart “good data” from “big data.”

The cloudification can make great economic sense and enable ample use cases, but it needs solid planning in order not to be led up the garden path.

While it might at first be perceived incorrectly as a rather boring housekeeping exercise, putting a solid data governance model in place is indeed tightly correlated with the success of the data-savvy enterprise and follows two basic principles, which are directly related to the firm’s balance sheet: gaining strategic insights to produce new digital revenue streams, and eliminating unnecessary costs for managing vast amounts of useless data.