by Isaac Sacolick

3 ways devops mitigates the operational IT skill gaps

Jun 24, 2019
CareersDevopsDigital Transformation

Advancements in devops can help close the skill gap in IT. The big question for CIOs is whether or not they're ready to learn from past mistakes.

number 3 with network nodes top three
Credit: Getty Images

Talk to CIOs and IT leaders about their strategic and innovation programs and they are sure to mention that skill gaps are a primary hurdle. One study shows a 900,000-person skill gap in artificial intelligence, IoT, and blockchain by 2022. Now this is in emerging technologies where cutting edge businesses are competing for the top talent, but other studies suggest that by 2020, 75% of organizations will experience visible business disruptions due to infrastructure and operations skills gaps.

It’s the latter stat that should frighten most CIO especially those investing in cloud native applications as part of their digital transformation programs.

If the skills and costs required to maintain cloud native applications, microservice architectures, real time data integrations, and large-scale analytical databases are as expensive and technically complex compared to legacy systems, then CIOs are going to have a very difficult time providing high performance and service levels.

There is a way out, but it requires CIOs to automate more while applications are being developed or when they are migrated to cloud environments. Here are some options.

1. Invest in a centralized monitoring platform

Historically, IT departments have been reasonably good at monitoring the compute, storage, and network infrastructure but struggled to adequately monitor databases, services, and applications. IT operational teams are also hard pressed to retain their subject matter experts who have enough application level knowledge to recover from application issues and research root cause analysis.

Today, that challenge is even greater as APIs and microservices represent more end points to monitor and transactions can span across multiple services, databases, cloud zones, and data centers. It’s unrealistic to assume that IT can hire and retain all the skills to manage these applications the same way legacy applications were managed.

This is where platforms like BigPanda that provide a centralized monitoring platform, autonomous operations, and embedded artificial intelligence capabilities can yield significant short- and long-term benefits. Centralized monitoring enables aggregating log data and events from multiple systems into a central data, and the machine learning capabilities correlate alerts from multiple incidents into a single, manageable incident.

Autonomous operations (or AIOps) can then route incidents into multiple ticketing systems based on the type of issue. For example, an issue with a microservice can be routed to the appropriate devops team in Jira and then to the required business teams through Slack.

The impact is that issues can be resolved faster and with fewer skilled IT people participating in the recovery and root cause analysis.

CIOs can baseline the magnitude of their IT operational issues by looking at several metrics and cost factors. Consider the mean time to recovery (MTTR) for incidents and then calculate the number of alerts tied to these incidents. MTTR represents a key performance indicator (KPI) CIOs should look to improve especially as applications become more strategic and operationally significant.

Also consider that the number of alerts generated per incident is a complexity factor people working in IT operations must contend with in trying to diagnose issues. The more alerts generated from different monitoring tools, the more they have to sift through the noise to find root cause.

Finally, consider the number of people involved in war rooms to diagnose complex issues, and the overall number of people called into resolve general incidents. Also review the number of people, total cost, and overall tenure of the people in IT operations supporting applications.

Having to support a large number of applications, high MTTRs, large numbers of incidents, high ratios of alerts per incident, frequent war rooms to recover from complex incidents, and high people costs are indicators that centralizing monitoring should be a devops priority.

2. Automate testing, integration, and deployment

Managing applications in production only solves part of the cost, complexity, and skills required to maintain modernized applications. More organizations are investing in continuous integration and deployment platforms (CI/CD) and continuous testing capabilities to enable more frequent and less error prone application release cycles.

When testing, integration, and deployments are automated with CI/CD platforms like Jenkins and automating testing tools like Selenium, it codifies much of the knowledge and subject matter expertise wrapped into release management practices. The IT organization still needs developers to make the necessary code changes, but regression tests can flag unexpected coding issues and CI/CD ensures that integration and deployment steps can be done without manual steps and procedures.

What’s more is the platforms and automations provide a level of documentation for IT to be able to support and extend this automation. New IT professionals don’t need to learn application support functions on their own or through subject matter experts. They can review the scripts and devops tool configurations.

3. Attract diverse IT talent by leading a devops culture change

Centralized monitoring, automated testing, and continuous integration and deployment are three devops best practices. Procuring the tools and implementing the automation are the operational steps to address the skills, cost, and complexity in maintaining the applications. The other more important consideration is how adopting these practices can change IT to a devops culture.

Culture change in IT has many facets, but one relevant to maintaining production applications is when organizations move beyond IT heroics required to maintain an application. Every organization has the issue of the one person who has enough knowledge to recover, research, and remediate issues with a critical application. That one person either hordes knowledge or doesn’t know how to bring more junior staff up to speed with the manual steps to either recover from issues or deploy complex application changes.

CIO have a responsibility to the organization to make sure that applications and cloud infrastructure developed today is easier to manage by a diverse team of people, with skills that are easier to find, and with processes that are less complex to support.

That has implications to how IT investments are operationalized today. In addition to the technology investments in centralized monitoring, automated testing, and CI/CD, CIOs should be leading their teams toward simplified architectures, operational standards, and automating more operational procedures. In addition, driving a data-driven culture by leveraging embedded analytics and machine learning in operational platforms ensures teams that have less knowledge of how applications were constructed can respond to incidents and aid in application changes.

This can be a big shift in IT especially for enterprises supporting a large number of applications. In addition to closing the skill gap, instituting automation, architecture standards and centralized monitoring reduces IT costs and improve system reliability.  

The main question for CIOs is whether they are ready to learn from past mistakes.