The world is awash in data. Every minute of 2017, Americans used over 2.6 million gigabytes of data, the Weather Channel received over 18 million forecast requests. YouTube viewers watched over 4 million videos. Twitter users sent over 446 thousand tweets. Google conducted over 3.6 million searches.
Over 90 percent of the world’s data has been created in the last two years – over 2.5 quintillion bytes of data per day. As the internet of things gathers more and more data from sensing devices in everyday objects, the volume of data will only increase.
All this data powers the digital economy, making every aspect of our lives richer, fuller, more interesting and more productive. We couldn’t really do without it in today’s global information economy.
To emphasize this unique contribution data makes to the digital economy, many commentators have begun to refer to data as the new oil. The Economist, for instance, describes data as “the oil of the digital era.” But as the Centre on Regulation in Europe says, this “often-used analogy between data and oil is misleading.”
What are the differences? Data can be used over and over again. It is not used up by a single use. One company might use the closing price of the Dow Jones average on January 17, 2017 to assess political risk in Egypt; another algorithm uses it for a sentiment analysis, and a third might use to as an input to a marketing plan for launching a new product or service.
In contrast, oil is a tangible, material substance in finite supply. Using it once means using it up; it is refined, processed, burned and then it is gone. The same gallon of gasoline that gets people to work cannot be used again to get them back home.
Moreover, oil and information differ in the ease with which it can be fenced off and controlled. The tangible physical nature of oil means it is very easy to establish control over it and exclude other people from using it.
But information is not like that. Many valuable data sets are widely available. For instance, the public can easily access prices charged by online merchants and that information can be used by anyone seeking to devise a pricing algorithm. Fitbit can access users’ physical information, but it cannot prevent them from using an alternative device that also measures their exercise patterns.
Firms can sometimes use technical, legal and contractual barriers to exclude people from access to data that they have collected, but the information itself can often be collected from other sources. Broadband providers and banks will not let others extract the email addresses of their customers from their data bases. But users can provide that information to many other companies and institutions.
Users typically engage with multiple providers of the same service. According to Deloitte, 81 percent use more than one online shopping service, 74 percent use multiple online news providers, 72 percent use more than one online travel service, and 70 percent use multiple email, instant messaging or video calling services. These alternative suppliers can access very similar behavior and preference data about their customers.
The Economist is economically sophisticated and recognizes the non-rivalry of data but they consider it only as a challenge to some traditional versions of privacy protection. The more far-reaching implication is economic: the inexhaustibility and non-excludability of data means that competitors can gather and use the data they need to compete.
The abundant supply and replicability of data sets also undermines the idea that data is a new infrastructure. The OECD, for instance, thinks that data should be considered as a “shared means to many ends” because a wide range of productive activities “require” data as an input.
But for the same reasons data is not oil, it is not infrastructure. An abundant supply of valuable data is typically available for common use. Moreover, no data set is essential; there are always alternative data sets that can replicate the economic function of proprietary data.
Some physical facilities such as roads, water, and power supplies have to be shared in common because they are essential for just about anything people and organizations would want to do in our society. This is true for the Internet itself and the physical communications channels that enable it. But the data sets generated through the use of the Internet and the applications that ride on top of it are not like that. They are not themselves part of essential infrastructure. Data is the traffic that flows along the communications channel, but it is not a communications channel itself.
Data is everywhere. It is abundant, not scarce. The digital economy cannot live without it. But it is not oil, and it is not infrastructure.