by Mark MacCarthy

Data sharing: a problematic idea in search of a problem to solve

Opinion
Aug 30, 2018
AnalyticsData and Information SecurityRegulation

Proposals to require companies to share their data are riddled with flaws and solve no pressing marketplace defects.

11 sharing
Credit: Thinkstock

The latest data sharing proposal comes from Prof. Myer-Schonberger, whose previous work on big data and the right to delete information won him a wide following in policy circles. The key idea is that companies above a certain size would be required to disgorge subsets of their data to competitors. Amazon, for example, would provide the world with its sales data so that anyone could create an alternative recommendation engine.

Voluntary data sharing arrangements among competitors have existed for generations. The most prominent example in the U.S. is credit bureaus, where banks and others voluntarily pool information in order to get a more accurate picture of risks for potential lenders, insurers and employers.

But in its proposed universal and mandatory form, data sharing suffers from many flaws, not just detailed implementation difficulties that could be expected to arise with even promising new ideas, but fundamental defects that make it unattractive in principle and unworkable in practice.

Data sharing is a privacy nightmare

If people are willing to share their information with Google or Facebook, it doesn’t follow that they want to share it with all the competitors of these companies. Forced data sharing runs against any notion of effective privacy protection. Companies with attractive and desirable data management practices would be required to pass personal information on to other companies with no established consumer protection processes. 

This could all be fixed if companies were required to deidentify the information before passing it on. But of course, identified information is the point. New social networks don’t want anonymous data; they want the list of Facebook’s users and everything Facebook knows about them. Google’s competitors don’t want random search data; they want individual level data, identified by IP address, device ID and other identifiers that privacy regulators treat as personal information. Amazon competitors don’t want aggregated sales data; they want Amazon’s individual level profiles to train their recommendation engines.

Data sharing would also create overwhelming disincentives to invest in data base construction

The non-rivalrous nature of information often gives rise to the feeling that there is no loss and all gain from data sharing. Let’s all use it together because it cannot be used up!

But free to use does not mean free to produce. Information does not reside in a Platonic heaven.  It exists embodied in tangible computer records. The construction and maintenance of accurate, up to date relevant systems of records is an enormously expensive tasks characterized by steep economies of scale. These data bases are often a treasured company asset, with values at transfer in the billions of dollars. It is hard to see why any company would invest in this effort if the fruit of its work would be immediately made available to all competitors at no or minimal charge.

The data sharing idea would override private contracts and the European data base directive that provide investors with incentives to create and maintain valuable data bases.

The alleged dangers of “centralization” and “central planning” are illusory.

Of course, antitrust law does not demand that companies with large market shares must be subject to special requirements such as IP or data sharing until other companies are more successful.

Still, data sharing might be a conceivable response if new companies could not gain access to the information they need to compete fully against incumbents. Yet every time regulators have looked at this issue in merger contexts they have determined that there is enough data post-merger to allow full and effective competition from alternative providers.

Myer-Schonberger thinks data sharing is needed to ward off system failures that could arise from centralization. When one company provides the best recommendation engine that most people want to use, what happens when the service makes a mistake? There’s nowhere to go to get an alternative answer that could correct the mistake. The result could be catastrophically misleading search results, consumer recommendations, and news feeds. When one company controls all the data, what happens if there’s a security breach? It’s a single point of failure that could have catastrophic results for the entire system.

But upon examination these ideas are mostly scary rhetoric. Forced data sharing doesn’t make the data vanish from the original data collector. So whatever security risks were present are still there. And with data sharing, every new entity who receives the original data is a new point of failure.

If a company gets its personalized results wrong, consumers don’t need to go to a competitor to be informed of the mistake. It’s like getting the wrong sized shoe; you know it doesn’t fit because it hurts. So, what happens with personalization mistakes? You don’t read the suggested article, you don’t buy the recommended product and you don’t click on the proffered search results. And the algorithm learns from that and tries to get it better next time.

If it doesn’t, then there are alternatives. Perhaps the biggest blind spot in the centralization argument is the idea that Amazon doesn’t have competitors like Wal-Mart, Facebook doesn’t have competitors like Snapchat, Twitter and LinkedIn, and Google doesn’t have competitors like Bing and DuckDuckGo, not to mention Yelp and Travelocity. Systematic, regular and widespread failure of these services would not be catastrophic except for the companies themselves, who would immediately see their market share eroded as people exit in mass to these alternatives.

Reformers should look elsewhere for practical remedies

There’s a widespread feeling that something is amiss in tech and many policy analysts are in search of remedies that will improve the status quo. In my view, mandated data sharing is an idea in search of a problem to solve. But even those who think the current tech marketplace needs a good dose of reform would be well advised to look elsewhere for practical, workable alternatives.