Can the Government Handle Big Data Analytics?
You name it, the government has a pile of data about it: genomics, energy use, the weather and more. Various open data and big data initiatives at the federal government aim to make this information available to anyone who wants it. Can the inherent complexity of big data analytics and the promise of open government coexist?
Mon, October 22, 2012
CIO — Just how big are big data? Not the big data hype bubble, mind you—we know that's enormous. Rather, how large do data sets have to be before we can consider them big data?
There is no one answer. Big data is a relative term. It refers to data sets, and the corresponding data challenges, so large that traditional data management and analytics approaches aren't up to the task of squeezing all the value we desire from the information we have. As a result, as our tools and techniques improve, the "bigness" threshold for big data will continue to rise.
This threshold also depends upon the context for the data, which generally aligns with the industry responsible for them. Genomics research, weather prediction and other scientific pursuits push the limit of data set size, but any business that collects information about its customers may also have big data challenges.
Keep in mind Parkinson's Law of Data: the amount of data available expands to fill the available space for it. As our technology for creating, moving and storing data improves, the big data threshold will continue to rise. If anything, it seems the relentless advance of technology is driving the ever-increasing acquisition of information—and this deluge promises to swamp even the most facile of big data strategies.
The central big data challenge, of course, is how to derive value from such immense data sets, essentially recovering those rare gems in the rough—identifying the important, meaningful and insightful nuggets in the onslaught of noise.
Counterintuitively, the more information we have, the less we actually desire, since we only prize the results of careful analysis of our big data, not the data themselves. A mountain containing gold is worthless, regardless of the size of the mountain, if the cost of extracting the precious material exceeds its value.
U.S. Government Sitting on Big Data Goldmine
Today, the U.S. government faces the mother of all big data mountains. From National Oceanic and Atmospheric Administration (NOAA) weather data to earth science information from the U.S. Geological Survey (USGS) to the genomics data at the National Institutes of Health (NIH), the government—and, therefore, the American people—own perhaps the largest collection of big data sets on this planet.
This is extraordinarily valuable in theory, true, but worthless if we're unable to extract the important nuggets. To mine this gold, the Obama Administration announced its Big Data Research and Development Initiative in March. Five agencies made about $200 million in new commitments toward improving big data tools and techniques: the aforementioned NIH and USGS plus the National Science Foundation, the Department of Defense (DOD), the Department of Energy (DOE). The data challenges these agencies and departments face range from better use of the DOE's supercomputers for crunching scientific data to facilitating "rapidly customizable visual reasoning" for diverse DOD missions.