by Stephen Parry

Firefighting: How do we know when IT and cloud organisations have gone over the tipping point?

Mar 16, 2018
Cloud ComputingIT LeadershipIT Strategy

Planned utilization of resources is the best way to ensure that balance between firefighting and performance is always in equilibrium, especially when the trends suggest increasing complexities and volumes.

The rate of technology change and voracious competition for market share are producing cloud organisations that are seriously overburdened, and it’s going to get worse if Kevin Casey’s “5 cloud computing trends for 2018” materialize.

Trends such as multiple hybrid cloud models require multivendor management, in addition to the integration of the technologies. This all comes at a time when the industry is still driving cloud adoption to gain scale. The drive for growth often occurs at the expense of cloud optimisation, which cannot be postponed. 

Given all these challenges, it is all too common for technology organisations to be managed predominantly in a ‘firefighting’ mode that is continuously allocating scarce resources to solve unanticipated, unplanned and urgent work.

While it is inevitable that all organisations will end up devoting a proportion of working time to identifying, addressing and solving unforeseen problems, it is the level of urgency, complexity and pressure that distracts even the best businesses from ensuring that the important but less interesting work is not put off for another day.

The assumption is ‘we can catch up later’ – only ‘later’ the pressure has increased and resulted in sacrificing even more housekeeping and maintenance work. And so the self-reinforcing loop continues. Like the image of the snake eating its tail, things go around in circles and continue in a downward spiral towards the tipping point.

So, how do organisations become hooked into this scenario? And how can they prevent themselves from reaching the dangerous tipping point after which firefighting becomes widespread? And is it actually possible to recover?

Determining the tipping point

The tipping point can be defined as the balance between workload and resources, beyond which the organisation cannot accomplish the tasks necessary to execute current projects.

Potential signs that you have reached your tipping point are:

  • increasing error rates
  • increasing work backlogs
  • chronic delivery delays
  • increasing recourse to task forces to recover situations.

For addressing and solving unplanned issues that inevitably arise in the course of work processes, firefighting is a legitimate tactic. However, when unplanned events end up exceeding or consuming all of the available planning time, firefighting becomes more than just a tactic. Before long, it becomes the standard method of getting things done.

A study by Nelson P. Repenning at MIT Past the Tipping Point suggests that a temporary increase in workload via firefighting can start a vicious cycle of firefighting and result in a permanent decline in performance.

In such a scenario, management ends up actively encouraging firefighting, because the focus is entirely on short-term performance or what is happening today. Supervisors bypass established procedures in order to ‘get it done’ and little thought is given to the work that is needed today which will prevent problems in the mid- and long-term. We lose sight of what will happen tomorrow, next week, next month, next year.

There is a real human cost to firefighting. People who spend most of their 9 to 5 working day putting out as many fires as they can soon develop a sense of futility. Thus morale, commitment and absenteeism all start going in the wrong direction. You can be sure when you see these signs you have passed the tipping point.

Pulling back from the tipping point

When the tipping point has been reached, the first rule is this: Don’t do anything to make things worse. Bizarrely, the solutions that are often proffered with good intentions actually do make matters worse.

Consider the case of the business which experiences an unfortunate series of major outages, only to discover that the majority of these have occurred because of mishandled routine changes to the infrastructure. In these situations, it’s not uncommon to put a ‘freeze’ on any further changes. Control over the change process is centralised with a lot more administration to further clog up operations.

Administration now consumes more of the resources when there were so little of it in the first place. This exacerbates the problem. Even the seemingly obvious call for automation, while logical in the short-term, again diverts limited resources. It’s this cycle that keeps repeating itself and can appear in many forms in any phase of the IT lifecycle, from new product development through to run. 

Repenning’s paper explains many more unproductive side effects of this behaviour. The key message is that it is easy to slip into firefighting mode, and if the conditions are not changed and remain unfavourable, you will inevitably become trapped there. Then, very quickly the behaviour can become institutionalised, and teams find it difficult to return to their earlier working practices even when the overburden or crisis has passed.

Avoiding tipping point

The resilience of organisations in being able to avoid the tipping point depends on how well they manage the balance between planning and firefighting. It is a difficult art – essentially a trade-off between steady state performance and the ability of the organisational system to handle unplanned changes in resource requirements, while still avoiding full firefighting mode.

Most organisations operate with full utilisation of staff where there is so little left spare for dealing with unexpected demand and events. In such circumstances, it becomes natural to go into firefighting mode. It is possible to work just short of the tipping point for some time, knowing that when the situation is recovered things can go back to normality. The risk, however, of reaching the tipping point remains dangerously high.

So, what is the solution? The only truly plausible course of action is to reduce the utilisation levels of engineers and technicians from around 100% to approximately 70%. While, for most organisations, this is completely within their gift, it forces executives to take a good hard look at their scheduling of work within its capability to deliver.

Anything else places the business closer to the tipping point. Individual departments must assess what percentage of their work involves responding to unplanned emergency work as a percentage of the whole. If the firefighting ratio is higher than the planned ratio, coupled with high levels of utilisation, it is only a matter of time before the tipping point is reached.

Planned utilisation of resources is the best way to ensure that balance between firefighting and performance is always in equilibrium, especially when the trends suggest increasing complexities and volumes.