The Democratization of Public Data and Analytics Tools

The combination of open data and new analytics tools creates a path to new insights

In recent years, we’ve seen a quiet but steady movement to open up more datasets for use by researchers, local governments, product development teams, and just about anyone else. Today, this “open data” movement makes an enormous amount of data freely available.

So what exactly is open data? The industry group Open Knowledge International offers this definition in its Open Data Handbook: “Open data is data that can be freely used, re-used and redistributed by anyone—subject only, at most, to the requirement to attribute and share alike.” This is important in the development of new products or services, data sharing drives innovation and collaboration to enable deeper insights that would not have been possible without open data sets.

In practical terms, when people talk about open data, they are usually talking about government data (local, state, and federal). In the private sector, enterprises guard their data carefully, and that isn’t likely to change. But governments collect and generate enormous amount of data that is available for use by anyone.

The trick is to gain access to that data. The fact that data in the public sector is legally available doesn’t mean it is accessible. And this is where open data initiatives enter the picture. These initiatives, such as the U.S. Government’s Data.gov project, make it easy to access government data. The Data.gov site, for example, currently offers more than 180,000 datasets that users can put to work to explore global climate change, analyze regional agricultural data, understand local crime rates—the potential uses are as diverse as the datasets.

This isn’t just a national government trend. Many state and local governments are active proponents of open data. My Dell colleague Teresa de Onis recently wrote a blog on Austin’s open data portal, which provides easy online access to many kinds of data about the city, ranging from reports on restaurant health inspections to performance ratings for the police and fire departments. Users can jump right in and search a catalog of datasets, charts, maps, and more.

This easy access to government data is welcome news to anyone who believes in open government. But then there is the backend challenge: putting the data to work. Data alone doesn’t provide insights into anything. To gain insights from big data, you need analytic tools that allow you to query the datasets.

There’s good news on this front, too. Today, the tools for analyzing datasets are far more affordable and far easier to use than the tools of the past. In many cases, it no longer takes a data scientist to extract information from vast amounts of data.

Basically, the democratization of government data has been accompanied by the democratization of tools for data analytics—and we are all better off for that.

Armando Acosta is the Hadoop planning and product manager at Dell.

