Cloud Computing: How Big is Big Data? IDC's Answer
The future torrent of data poses challenges around search, privacy and compliance, IT staffing, and more. CIO.com's Bernard Golden looks at IDC's 2010 Digital Universe Study and concludes that IDC underestimates how much of that data will live in the cloud.
Fri, May 07, 2010
CIO — I came across a link to a new report from IDC called the "2010 Digital Universe Study". The report echoes what we've been telling our clients for the past year: the projections of the past few years about the growth of data significantly underestimate how much data is going to be created.
Some highlights of the report:
- In 2010, the Digital Universe (a fancy term for all the data created by consumers and businesses on earth, including video, audio, documents, etc.) will grow by 1.2 zettabytes, or 1.2 million petabytes.
- By 2020, the Digital Universe will be 44 times as large as it was in 2009.
- Surprisingly, the number of objects (i.e., files that contain digital data) will increase faster than the total amount of data, due to smaller file sizes — even though lots of large video and audio files are being created, so are massive amounts of small files created by devices, sensors, etc.
The report goes on to highlight some of the biggest issues the future torrent of data will pose:
- Searching: How to find a digital needle in a gigantic data haystack? Most of the data will be unstructured, implying new kinds of searching mechanisms are required.
- Data Tiers: If you thought Hierarchical Storage Management was important before, imagine how necessary it will be in the face of zettabytes of data. A strategy to define a layered approach to storage, based on historical use, immediacy of need, and cost of storage will be necessary.
- Privacy and Compliance: How can the increasing requirements of privacy and compliance be controlled with so much data under management?
- Headcount Mismatch: While the amount of data will increase 44 times, and the number of files will increase 67 times, the number of employees will increase by only 1.4 times.
The report notes that by 2020, much of this data will be held in cloud environments or will be "touched by cloud," which means data that transits through a cloud service or is temporarily held in a cloud application. The report estimates that perhaps 15% of all data will be held in the cloud, and that around one-third will live in or pass through the cloud. Frankly, I think that underestimates what's going to be in the cloud, for this reason:
It's clear that the growth of data is accelerating, which is to say that much of it will be created later in the 2010 - 2020 decade. This means that the average corporation is going to experience an increasing deluge of data — in other words, no matter what level of investment they've already got in storage, it will be accelerating as the decade goes on. This will require ever-increasing amounts of storage and an ever-increasing capital budget for storage devices — not to mention more headcount. There's a truism in economics that something that can't go on, won't go on. I just don't see most companies funding an ever-increasing number of storage devices and employees to manage them, i.e., most companies can't afford the projected growth of storage, so they won't go down the road of on-site storage. Long before they get to the logical conclusion of how much investment, capital, and headcount is required to manage the increased storage, they'll turn to specialized providers who have figured out how to manage enormous amounts of storage more cost-effectively.