by Mick Smith

404: Technology leadership not found in Aussie government

News
Mar 26, 2020
C-SuiteCloud Computing

The myGov failure this week isn’t the first instance of incompetent IT practice in the government, but it should be the last one allowed.

I’ve been in the technology sector for more than 20 years and I’ve spent a bit of time working with government. I’ve seen and heard it all.

In my experience, the major access failures at myGov this week can be attributed to several key factors.

For a web app such as the myGov website, you need a pretty robust infrastructure utilising technology that can grow and scale on demand. 

In 2019, there were more than 15 million myGov accounts with more Aussies turning towards doing their transactions online as opposed to visiting a government office like Centrelink.

More than half a million people are known to log on to the website every day. That’s during a typical day. Not during a day where millions of people found themselves unemployed.

The government’s excuse that it didn’t “realise the sheer scale” of how the website would be used would be valid if it hadn’t happened so many times before.

So what can the government learn from their most recent failure, and what technologies should be at the top of their lists to research and deploy?

Autoscaling (or elastic scaling). A method in cloud computing where the number of resources you need (either additional hardware or power) scales depending on the number of users actively using your web application or cloud-based infrastructure.

Autoscaling should be part of the myGov service

The government is pretty hush-hush on what technology they use, but you can safely assume that they don’t have this type of technology in place. All cloud providers have some method of auto scaling as part of their service.

Whether it is AWS Autoscaling, Auto Scaling with Azure Monitor and Scale Sets, autonomous autoscaling in Oracle, Google’s autoscaling through its Compute engine, or SAP Application Server Autoscaler. Even the lesser-used Alibaba Cloud has built-in auto scaling functionality.

Setup correctly, autoscaling will allow you to grow with demand either by adding more hardware to a server cluster (horizontal scaling) or by adding more resources such as RAM or CPU to an existing virtual machine or instance (vertical scaling).

So the question is, does the government have this in place? If they say yes, then it’s simple: it’s not configured correctly, or the underlying application has issues that need to be optimised for the number of users they expect to have.

Once you have a stable, robust and auto-scalable architecture, it is simply a matter of optimising code and adjusting network/routing configurations to ensure that the web application is ready to handle the load.

There are other companies out there that have far more users than the myGov site that have gone through their own teething issues early on, have adjusted their tech stack and optimised their applications and are now ready for an increase in demand (planned and unplanned).

An example of this is Tabcorp, which during the day of the Melbourne Cup can process around 3000 transactions per second with only minor downtime as, presumably, its applications can scale.

We’re talking thousands of people every second doing financial and betting transactions; not people logging onto a government web app to apply for assistance.

While auto-scaling is the answer, you can also preempt an increase in demand if you know your user base. One would think that the prime minister would apprise the government services minister, Stuart Robert, of what industries are going to close and when.

The minister could then take that information and ensure that his department, alongside his colleagues from the Australian Taxation Office and the Digital Transformation Agency, could throw extra resources at the application in preparation for what was to come.

They could’ve increased resources, tested and simulated traffic loads to optimise performance and ironed out any kinks in the system well before people were greeted with a generic 404 or 500 error message.

Early testing would isolate any load balancing or coding issues and would identify any capacity issues with where the traffic was routed. For example, if you are trying to push 100GB of network traffic down a 10GB piece of shared dark fibre between data centres and your business, then you are going to experience significant packet loss, and your web app is going to crash.

The government needs to take IT seriously in practice

It is my opinion that I don’t think anyone in the government has given IT the priority it deserves, and not just at myGov, I’m talking across all areas of the government network. A lack of foresight combined with poor execution all mixed with panic and uncertainty. The perfect recipe for an IT disaster.

Sometimes you have the wrong people in senior jobs that don’t know what they’re doing, or they are out of touch with the latest developments in the industry. IT is one of the fastest moving industries in the world; people who are actively on the ground find it hard to keep up, let alone politicians who have numerous responsibilities to manage.

Combine this with government red tape, an outdated understanding of industry best practices, a lack of resources to deliver and you end up with what we have seen in the past week.

We are not seeing the failings of a single person. The bottom line is there is clearly a major flaw in the government’s IT strategy, a lack of staff with the necessary skill sets to drive change, and (like a lot of businesses, not just government) a lot of IT/coding debt.

The result of multiple changes in leadership, multiple contracting parties and developers all building on top of an already flawed system with nobody knowing how it all comes together.

It’s been a tough week for Stuart Robert. On Monday, he was dealing with fallout from media reports indicating that the MyGov website had crashed under the pressure of Australians jumping online for welfare payments as part of the government’s response to Covid-19.

Robert initially blamed a DDoS attack, which he said occurred at the same time as more than 55,000 citizens were trying to access the site to register for payments. But a few hours later, it was revealed that a cyber attack was not to blame, rather the government had simply failed to anticipate the rise in traffic.

One thing is clear. Once the dust has settled, it’s time for a change in how we approach technology in the government, and it’s time we get the right people in charge of it.

For now, we’ll have to keep hitting refresh. You may not be able to count on the government, but you can always rely on the trusty ol’ F5 function key.

Mick Smith is a Sydney-based senior technology executive.