by Paul Barth

The rise of the data marketplace

Mar 27, 2017
AnalyticsBig DataBusiness Intelligence

What Amazon can tell us about the future of enterprise data management.

consumer data ts
Credit: Thinkstock

It’s heady times for data. Big data, data lakes, data-as-a-service, data breaches—a week can’t go by without a headline mentioning data.  Certainly, businesses are aggressively investing in harnessing their data as an asset to drive strategic insights, automate complex processes, and personalize customer experiences.  Companies are doubling down on data security, and at the same time starting to package insights as information products.

All this energy is turning up the pressure on data management in these organizations, and cracks are rapidly appearing.  The status quo of highly engineered data warehouse supply chains surrounded by pockets of specialized analytical sandboxes can’t keep up with business demand for data.  Tensions are rising between groups tasked with locking data down with those trying to set it free. 

What is emerging is nothing short of a seismic shift in data management.  The demand for business agility with data is moving the center of gravity from IT producers to business consumers. Like never before, organizations are striving to provide self-service data access for business analysts through a secure, well-managed environment where users can quickly find, understand, and prepare data for their specific needs.

David Wells of Eckerson Group calls this new environment a “data marketplace.” (source) Drawing on analogies on e-commerce, a data marketplace is a place for analysts and other data consumers go to find and provision the data they need.  While this sounds simple and intuitive, making it a reality actually requires a major overhaul of common data management platforms and processes.

Wells cites three ways that data marketplaces differ from data warehouses:

  • Cataloging: Instead of mapping data to a comprehensive enterprise data model (which takes an enormous amount of engineering before users can access it), data sets are cataloged along the path from “raw” data to “ready” data.  The catalog describes data quality, completeness, business definitions, and how it has been used to help users find and understand the data they need.
  • Curating: The marketplace replaces the painstaking effort of creating a “single source of the truth” with an agile, on-demand approach to improving data incrementally. Users who need harmonized, consistent data use tools in the marketplace to rapidly prepare “fit for purpose” data sets.  Over time, commonly used, clean views of data evolve into trusted data shared across the enterprise.
  • Crowdsourcing:  All stakeholders in the marketplace—data producers, stewards, and consumers—actively improve it every time it is used.  Business users note data quality and consistency gaps, stewards establish common definitions and “go to” data sets, and source system experts identify sensitive data that needs protection.  These stakeholders continuously enrich the marketplace catalog, and use it to coordinate with each other as data sources evolve and new business requirements emerge.

I have had the good fortune of working with early adopters of the marketplace, and the results have astounded me – analysis time reduced from months to days; data preparation costs in cut in half; millions of dollars in hard cost savings through migration and retirement of legacy systems. Just as important, consolidating the “first mile” of the data supply chain has improved data security and governance—these companies use the marketplace to enforce and monitor important data protection and access policies.

These early successes (along with 25 years of scars from traditional approaches) have convinced me that the data marketplace is the platform for the future.  I look forward to sharing what I learn on this journey.