by Sarah Putt

Stats NZ’s data transformation: Where it stands after 5 years

Oct 28, 2020
AnalyticsDigital Transformation

Efforts to modernise how the New Zealand statistics agency work meant it was ready for the pandemic’s work-from-home requirements and a deluge of new data.

A laptop user with magnifying lens examines binary data.
Credit: AlphaSpirit / Getty Images

Conventional wisdom is that it takes 10 years to become an overnight success. So, Stats NZ has done well to get its data transformation programme up to speed in half that time, ensuring the department could quickly pivot to providing new data from disparate sources when the pandemic struck.

Stats NZ chief digital officer Chris Buxton says when the pandemic struck in spring 2020, the department had already enabled 95% of the organisation to working from home following the 2016 Kaikoura earthquake when Statistics House was damaged.

Data gathering during the pandemic

The big issue with the COVID-19 lockdowns was that the field interview workforce couldn’t leave home, and this affected data collection in areas such as monitoring the price of groceries. In addition, the New Zealand government started asking new questions related to the lockdown, such as how the public was responding and how key infrastructure like broadband networks was holding up.

To find those answers at speed, Stats NZ turned to the people holding that data, such as the supermarkets and the telcos. It could handle a large amount of data from a range of businesses because the technology deployed during its five-year digital transformation “had given us the mechanism to decouple data and store it in a secure way,” Buxton says.

Stats NZ’s new Datahub is a centralised data management platform that stores, verifies, and analyses data. Based on the Cloudera technology stack, it feeds data into new and existing statistical and analytic systems such as the R and Python programming languages and tools such as SAS and Microsoft’s Power BI. Errors in data are detected early through automated validation and checking systems that also extract relevant metadata and prepare the data for use.

With no restrictions on how data is formatted, the Datahub provides increased flexibility around how data is stored and used. “It’s important to note that Datahub isn’t a database or a tool, but a new framework for management and using data,” he said in his 2020 CIO50 New Zealand profile. The public can access the results of this data aggregation at Stats NZ’s COVID-19 data portal.

Integrated Data Infrastructure and the rules of engagement

Aggregating data sets is not a new programme of work for Stats NZ. The department had previously developed the Integrated Data Infrastructure (IDI) data set, which consists of about 60 key government databases across areas such as education, justice, welfare, and treasury. It is used by researchers from government agencies, nongovernment organisations, and universities that are studying issues such as vulnerable children, business productivity, and the impact of health conditions. An example is the research by the New Zealand Treasury to identify childhood risk factors that contribute to adverse outcomes later in life.

Buxton says access to the IDI is strictly controlled. Researchers must submit a proposal, and they must themselves be vetted before they can use the IDI. And even then, they can only use a defined area of the IDI for a limited time period. “We use the ‘five safes’ model: safe people, safe projects, safe settings, safe data, safe output,” Buxton says. “The ‘safes’ make sure that there is a level of protection because it is a big database and there is a lot of data in there. The individual data sets are not necessarily sensitive, but it’s the aggregation of them all that we feel is quite sensitive, so we do wrap quite a lot of protection around it.”

Māori data sovereignty framework in development

Stats NZ is working with iwi leaders on co-designing a Māori data sovereignty framework, which Buxton expects will be delivered by 2021. “The intent there is … to get a better understanding of what Māori data sovereignty is and what the government can do to support Māori in that process,” he says.

More broadly, Buxton says that Stats NZ recognise that all New Zealand data is a taonga (national treasure), so when it does house data in offshore public clouds, “we would hold the sovereign key so they couldn’t decrypt the data at all. We also have additional controls about onshore backup of any data that is held offshore.”

Census still critical to achieving accurate information about NZ

While there has been plenty of work around creating a virtual census using existing data —what Buxton calls the “admin data approach”—he says that nothing beats the actual census for ensuring the entire population is accounted for. “There is a population that doesn’t live natively digital. They are the really hard-to-reach populations that actually the census has to focus on.”

Buxton notes that there were challenges with the last census in 2018, when the department overestimated the rate of digital adoption in New Zealand and missed significant parts of the population. For example, the response rate among Māori was 68% and Pacific people 65%, down from 88.5% and 88.3% in the 2013 census. So, for the expected 2023 census, “we have started preparations … to maximise the response and return from those parts of the country where we didn’t get a good response and we didn’t fully engage properly last time,” Buxton says.

Algorithm charter and the new privacy laws

Transparency about the data it collects and how it is used is at the forefront of Stats NZ’s approach, with the department releasing the Algorithm Charter that has been adopted by more than 20 government departments, including most recently the New Zealand police.

“Algorithms can be used to make all sorts of decisions around the lives of the citizen. The big danger there is that there is that it becomes nontransparent, and you lose that personal connection between citizen and government. That’s not something that government wants to see happen. The algorithm charter was actually defined to increase transparency around the use of algorithms where government is using them to make decisions around citizens,” Buxton says.

Although the charter is left to each individual agency to implement, Buxton says there are several governance groups that oversee how data is used “very frequently”.

As for the new Privacy Act that comes into effect on December 1, 2020, Buxton says there is nothing that Stats NZ needs to “fundamentally change” to ensure it is compliant. But it will review its internal policies to ensure their language and terminology matches that used in the new legislation.