by Sarah Putt

How Stats NZ democratised the use of data

Jul 22, 2020
AnalyticsData IntegrationGovernment

A customisable ingestion layer makes it easy for agencies to provide data in their native formats, smoothing the path for sharing data.

analyzing performance of wireless mobile connectivity data statistics
Credit: Thinkstock

The data portal created by Stats NZ to provide up-to-date information during the COVID-19 global pandemic shows how a scalable data dashboard can be created from multiple sources in a short time. Initially created to assist policy advisors who were desperate for data during the lockdown, it continues to be useful as the country tackles the economic crisis brought on by the global pandemic.

Stats NZ general manager for economic and environmental insights Richard Evans says the portal went live on 21 April 2020 with 40 indicators and has grown to now include about 100 indicators. It is intended to be used by all New Zealanders and is compiled with the needs of both the public and private sectors in mind.

“What is most helpful is having this one essential portal to provide to not only our policy clients, but also the man or woman in the street, the academics, the business owners. Everyone has access to the data that is being used by the government decision makers to make decisions around the COVID crisis,” he says.

He says the need for speed to market is forcing them to rethink the idea of ‘data quality’—so that timeliness, frequency and accessibility are considered alongside accuracy and coherence.

“It’s meant forgoing any kind of official vetting of data sources, such as checking into the methods used. My point of view when I was asked was ‘If it’s useful to people and if we don’t think there is anything terribly wrong with it, we should probably consider putting it on.’ The main criterion is ‘Does it look reasonable and will it help people understand the economic impacts of the crisis?’” Evans says.

How Stats NZ created the data portal quickly

Design analyst James McKay says the system has been built to ingest data from external sources as quickly as possible. That data could be daily broadband uptake stats from Chorus, or the New Zealand Activity Index which is drawn from multiple indicators to provide a weekly update on underlying economic conditions.

“What’s been built is a system where we can ingest any sort of data file just by writing a small function, so effectively we have created a customisable ingestion layer, like a transformation layer. Basically, anyone can come along and write a small function that parses the file they have been given. That might be 20 minutes of work to write that small function and then from there on it’s in a consistent format,” McKay explains.

“We are not having to go back to respondents and ask them to give it to us in a different format. And it’s not creating too much of a burden because they can just send them in the format they’ve got and then we just do that little bit of work on our end to transform it to our needs,” he says.

Automating the process upfront has saved hours of time, and by removing the manual process as much as possible it reduces the chances of making errors.

This ability to scale quickly has become more essential over time, says McKay. “Seems like a small thing when you’ve got ten or 12 data sets but now that we’ve got dozens and dozens of different files, it’s the case where you don’t want to be doing any manual process.”

While there are plenty of ‘point and click’ applications available to quickly create dashboards, McKay wanted to work with something more configurable. He wrote in the statistical language R and used the R Shiny package which generates the JavaScript and HTML that can be surfaced in a web browser.

“Having a tool set like R Shiny means we can write the code end-to-end, right from the ingestion phase through to the visualisation, and a single person can do all that,” he says. McKay notes that this is probably the job of a software developer or a data analyst skilled in R.

Stats NZ has published the code on the official Stats NZ GitHub repository so it is now fully open source and available.

Relationships are key to providing the information

In addition to McKay’s technical skill set, the project requires someone with good relationship skills, who can work closely with government departments and private sector organisations to get the information. This job has fallen to insights analyst Bernie Hanratty.

“The uniqueness of this particular piece of work is that we are going to both the public and private sectors. Generally, I do a lot of work across government and in this case the scope was much wider and so the opportunity was to engage and work with the private sector, and get their buy-in and support,” Hanratty says. “A lot of them were offering the information, but they didn’t necessarily have the profile and the visibility of that information, so being able to integrate that alongside our public data sources in one location is hugely beneficial.”

Since launch the cumulative unique page views on the site are now at 32,000, with the average time on site being 14 minutes.

The next stage is to provide more granular data and Hanratty is in talks with organisations such as regional councils about what they might look like.

The future of the Stats NZ COVID-19 portal

A forerunner to the COVID-19 portal was set up by Stats NZ following the Canterbury earthquakes, when those involved in the recovery required up-to-date data to assist with decision making. So, is thissuch a data portal something only required during catastrophic events or do they consider it to be an ongoing exercise?

Evans says Stats NZ will keep the portal for as long as its clients require it. For example, it may be that as the economy returns to normal the need for weekly updates on trade movements is less pressing. Whether it continues to exist in its current form, the data portal has highlighted the need for Stats NZ to look to different methods of collecting and visualising information. “We are trying to get away from what has been our default mode over several decades—which is to run a survey on different parts of the population—and move towards data that already exists,” he says.

“More and more human activity leaves a digital trace, and we’re very interested in gathering those digital traces. If they identify individuals then we anonymise them and use them in the statistical process and produce aggregate data products. Our preference is to get free and open data if we can get it and if it’s useful. This whole experience can be seen as a step in that direction” he says.