WASHINGTON -- When the Obama administration launched its open data initiative in 2009, federal agencies naturally responded by simply posting various nonsensitive information sets up on the government's new online portal, Data.gov.
Now, more than four years later, the government is moving into the next phase of the process and trying to make that data more usable, an effort that includes taking inventory of the departments' and agencies' assets, shifting to machine-readable formats, and, ultimately, layering APIs on top to make the data more accessible to developers and researchers.
"We all talk about big data, but really the issue for delivering on government missions is not just collecting data, but it's really transforming data into information products, developing knowledge out of that data," Simon Szykman, CIO at the Department of Commerce, said here at a government IT conference.
In the more forward-looking corners of the government, officials envision marshaling the government's data assets into a platform that could serve as a seedbed for innovation and development in the private sector and academia, inviting parallels to the government's role in the early days of the Internet and supercomputing.
"Across the federal government we have a very diverse but also coordinated set of activities that focus on the foundations of big data, and the foundations of how we move forward in creating the core technologies that can support all the activities," said Fen Zhao, staff associate at the National Science Foundation's Directorate for Computer and Information Science and Engineering.
A Platform for Innovation
"We want to build a platform on which you can do a lot of these great new innovative things, and build a platform just like the Internet -- we had ARPAET, NSFNET, and built the foundational technologies that will enable the creation of the Internet and what everybody else will do with it," Zhao added.
In addition to her work at the National Science Foundation, Zhao is also a fellow in White House Office of Science and Technology Policy and is helping guide the big data research and development initiative the administration launched last March.
As part of that effort, which began with $200 million in funding commitments from six federal departments and agencies, the NSF earlier this year issued a request for information, seeking proposals for big data projects that would advance core technologies or serve national priorities like economic growth or education from businesses, academic institutions, nonprofits and other entities.
The idea of positioning the government's big data sets as a platform for outside innovation reflects the understanding that those assets, with their vast size and scope, hold enormous value that the departments and agencies will never be able to fully realize on their own.
And it's not without precedent. Advocates of opening more data to the public point to the commercial successes that were built off the government's Global Positioning System and the weather data maintained by the National Oceanic and Atmospheric Administration.
As an example, Szykman suggested that if more data from the Commerce and Labor departments were made available in a machine-readable format and with accompanying APIs, a Website advertising homes for sale could make its listings searchable by the neighborhood's median income, unemployment rates and other variables, enabling house hunters to seek out the areas where they might enjoy the best career prospects. An interesting idea, but certainly not one that the government would produce on its own, particularly in a time when agencies are operating under perpetual budget constraints.
"The government has a mission. The government has resources, which are limited. And the government has ideas, and sometimes the ideas that the government has in terms of what can be done with this data goes beyond what we can do with resources, and sometimes the ideas themselves are limited in the sense that we don't pretend that we've cornered the market on good ideas and there are people in the private sector -- individual citizens and companies -- that can come up with new ways of using this data that may lead to new innovation, new uses that provide value. Sharing more of our data, the benefit from doing that is not just the transparency issue. It's really creating new ways in which the data that we produce and disseminate can be used by others in ways that we may not have anticipated or may not be pursuing within our existing resources."
Kenneth Corbin is a Washington, D.C.-based writer who covers government and regulatory issues for CIO.com. Follow Kenneth on Twitter @kecorb. Follow everything from CIO.com on Twitter @CIOonline, Facebook, Google + and LinkedIn.