Between 60 and 80 percent of IT incidents and service disruptions are caused by change. This makes it paramount for chief information officers and other senior technology executives across private sector and government organisations to mitigate the high risk of change issues, especially as they deliver more digital services to their customer bases.
Indeed, many enterprises now realise that having a ‘central source of truth’ can help them manage issues around change by making it clear what impact changes will have on the on-premise or cloud infrastructure that power digital services.
Senior technology executives gathered in Perth and Sydney in Australia and Auckland, New Zealand recently to discuss the key challenges they are facing when it comes to understanding their technology assets and the impact that disruptions have on the delivery of digital services. They discussed the roles that artificial intelligence (AI) and machine learning (ML) technology play in augmenting the performance of IT operations and services teams.
The events were hosted by CIO Australia and CIO New Zealand and supported by ServiceNow.
Craig Horton, head of information technology at The Royal Australian & New Zealand College of Radiologists (RANZCR), says the organisation has internal and managed service providers, which means compliance across all assets is challenging.
“The challenge we face is moving to a proactive method of managing these assets. This will help us provide valuable insights into areas that need attention, which will decrease the risk of disruption to key assets, but we still have some way to go before achieving this,” he says.
Horton also points out that data collected across the organisation’s technology stack is extensive.
“Having skilled engineers scanning datasets is lost productivity. Through augmented data and proactive reporting allows engineers to be elevated in their roles and focus on areas of importance,” he says.
In organisations today, the high volume of user requests and machine-generated incidents poses a problem as most teams struggle with the scale to solve them all, says Michael Porfirio, senior director, IT transformation solution consulting, Asia-Pacific and Japan at ServiceNow.
Porfirio says issues arise because many organisations lack the visibility over how their infrastructure maps to their digital business services.
“If you then also add the complexity and scale of current digital services that can be spread across public and private cloud and SaaS infrastructure, for instance, it makes it difficult for organisations to understand the context of an incident and its impact,” Porfirio says.
“Many organisations are still using either legacy systems or point solutions to solve the problem. By doing so, they can never achieve the service awareness needed to enrich events, to provide context so that organisations have a deeper insight into the impact of these events and requests on their businesses.”
Frederick Lusk, chief information officer at Justice Health & Forensic Mental Health Network, admits that ICT services at his organisation have grown organically without much planning and even less documentation. The understanding of the impact of changes even in basic terms is challenging due to heavy reliance on the information that staff members hold in their heads, he says.
“At best, we have an idea of the impact, but the devil is always in the detail. At this point we haven’t adopted artificial intelligence or machine learning capabilities; we’re too busy putting out fires to step back and do things properly. It’s a bit of a Catch-22 situation,” he says.
Jimmy Lee, chief information officer at Cedar Woods Properties, says as a large cloud user, the organisation does experience unplanned outages. But unplanned outages that affect network assets, such as routers, switches and the wide area network, tend not to impact customer service.
“This is one of the great advantages of using enterprise-grade cloud,” he says.
Cedar Woods has adopted a multi-source vendor strategy so most of its infrastructure is provided using managed services with some of those vendors using AI and ML to improve service performance.
“The large players like Microsoft make this transparent for us, which is a good thing in my books,” he says.
The considerable advancement in predictive analytics and ML has seen audit, tax and consulting firm, RSM Australia, elevate its efforts in examining the use of AI to ensure its IT can be more proactive, says chief information officer, Paul Joseph.
“Our client service delivery relies on clients and staff accessing required information and systems. At first, we started collecting and analysing system logs across our IT endpoints to identify when an employee experienced an issue,” Joseph says.
“This changed the perception of our staff to realise the value the operational teams brought the business. Our team will be building further on this with the use of AI, especially for predictive analytics so that, based on patterns, we can be alerted to potential issues before they have occurred.
“We have embedded extensive automation into our IT service desk, and we see the use of AI as a natural extension to ensure our team is being utilised efficiently.”
Meanwhile, Simon Casey, chief information at real estate company, Barfoot & Thompson, says organisations with sprawling IT environments should consider using AI service operations tools.
“They can self-learn the environment and assist in rapid recover from complex system failure. The best approach would be to consider existing world-class service operations as they have AI capability built in,” he says.
Deciding when to automate
Roundtable attendees were asked at the events how they decide when it’s safe to introduce automation across their customer service teams. Importantly, how do they strike a balance between automation and human interaction?
Automation across key services is not new to IT departments, for example, restarting services when alerts are triggered. Human interaction, often by skilled engineers, becomes important to determine if there is a larger issue at hand, says RANZCR’s Horton.
Understanding the patterns of which automation services are performing must also be analysed by those same engineers, he adds.
“Again, this emphasises the importance of engineers to focus on areas of need or risk,” says Horton.
Client service and relationships will always remain at the forefront when deciding what to automate, says RSM’s Joseph.
Automation is only introduced after it has been thoroughly tested and piloted with stakeholder groups. When determining which automation tasks to implement, the team asks if the automation task is designed to make work easier for IT staff or to improve the customer relationship, he says.
“We place a heavier focus on the latter as this will always ensure the success of automation and striking the balance with human interaction,” he says.
Justice Health & Forensic Mental Health Network’s Lusk says that using AI will give the organisation an indication where people need to look for automation opportunities in the first instance.
“As we evolve our capabilities and understandings, ML will enable us to apply proactive actions to maintain optimal service delivery,” he says.
Determining success or failure
When measuring the success or failure of automation to reduce the risk that change issues will impact operations and the delivery of digital services, Justice Health & Forensic Mental Health Network looks for a reduction in the number of occurrences of previously known issues. If the same event recurs multiple times whenever a change is introduced, this is a clear indication of a problem, says Lusk.
“My staff are very good at recovering services when known issues occur. We can become better at preventing those issues from occurring in the first place,” he says.
RSM’s Joseph says his organisation uses the usual metrics of ‘time saved and efficiencies introduced’ when determining if automation has worked or not.
“But it’s ultimately the client experience that determines the success or failure of our digital initiatives. Thankfully, we have seen the successful interaction and adoption of our automated technologies over the last few years and particularly the digital transformation program, which was accelerated as a result of the impact of COVID.”
Key steps to take now
Most IT operations teams have made significant investments in supporting the digital initiatives of their organisations. They are ensuring that their businesses can spin up new services, and they have the applications to support new processes, as well as analytics capabilities to make better decisions, says Tim Sheedy, principal analyst at Ecosystm.
But he believes that there are still significant gaps between the provision of applications and infrastructure and understanding the customer and business processes they support.
“Knowing there is an outage or an incident that is impacting performance of a system is one thing; understanding the impact of that incident on customer and employee processes is another. This knowledge allows the business and IT operation team to prioritise the fix,” he says.
Sheedy says there is now a real opportunity for organisations to point machine learning at their technology investments to create a better understanding of “how asset A impacts process Z.”
“For example, if a firewall that is protecting a customer payment platform fails —
or sees a degradation in its service — this will have a significant business impact as payments might be slow or fail,” Sheedy says.
“But if another firewall that protects the quarter-end reporting system fails mid-quarter, then this is less of an issue for the business at that time. Today, most IT service and ops teams just see two firewalls and treat them with the same level of service. AI can be used to automate the link between technology assets and business impact.”
AI and ML technologies can also be used to automate fixes and minimise outages — even predicting them before they happen. But without understanding the business impact of an outage, it’s hard to know where to focus your AI Ops lens. This is why AI Ops should be considered hand-in-hand with service-aware IT operations,” he says.