Top reasons for network downtime

Network outages linked to human error, incompatible changes, greater complexity

powerline networking tips
Credit: ThinkStock

New research paints a somewhat bleak picture of network performance. Outages are frequent. Hours typically pass before an issue is reported and resolved. Protective measures are manual and error prone.

The source of the data is a survey of 315 network pros at midsize and large enterprises. The survey was sponsored by Veriflow, a San Jose, Calif.-based startup that aims to minimize the risk of network vulnerabilities and outages. Veriflow’s software is designed to catch network problems before they happen by predicting possible network-wide behavior and continually verifying that a network model adheres to an enterprise’s security and resilience policies.

The survey results are interesting (with the acknowledgement that the sponsor of the survey makes software to combat network outages). Here are some of the key findings.

The human element
Nearly all respondents (97%) agree that human error is a cause of network outages. How much of a factor, however, varies. Roughly half (52%) said human error leads to few network outages. Other respondents find it’s a contributor to frequent network outages (25%), most network outages (18%), and even all network outages (2%). Just 3% said they catch and correct all mistakes before they cause an outage.

Incompatible changes
Network changes that are not properly evaluated are another common cause of incidents. The impact on the business varies. At the high-impact end, 5% of respondents said that network changes lead to a network outage or performance issue on a daily basis, and 7% said it happens several times a week. At the low-impact end, 2% said it never happens and 7% said it’s a “once every couple years” event. The most common answer, cited by 44% of respondents, is that network changes lead to outage or performance issues “several times a year.”

change frequency Veriflow

Manual dependence
How do IT teams verify that the network is functioning properly after making a network change? The approach is often manual, Veriflow finds. Among respondents, 69% said they rely on manual processes, such as inspecting devices via the command line interface, inspecting configurations, and performing manual traceroutes or pings (see chart below for more details).

verify network Veriflow

Predictive monitoring: room for improvement
There’s a lot of room for improvement when it comes to network monitoring tools’ predictive capabilities. Just 6% of respondents said that between 90% and 100% of their network performance issues and outages are predicted by their network monitoring tools. Another 15% said their tools predict 70% to 90% of network performance issues and outages, and 13% said tools predict 50% to 75% of those issues. The rest of the respondents said that their monitoring tools predict less than half of all network issues: 21% of respondents said 25% to 50% of issues get predicted; 25% of respondents said 1% to 24% of issues get predicted; 15% of respondents said their tools don’t predict any issues; and 5% of respondents don’t have a network monitoring solution.

Resolution time
When asked how long it takes to find and resolve a network issue after it’s reported, some IT pros reported speedy results: 21% of respondents said it takes, on average, less than an hour to resolve networking issues. Everyone else said it takes longer (see chart below for more details).

resolution time Veriflow

Compliance conundrum
Roughly 76% of survey respondents said their organization has network compliance requirements in place to ensure privacy and security of data and systems. But many respondents are doubtful that their network is always compliant: 56% called themselves moderately confident; 19% said only slightly confident; and 6% said not confident at all. Just 20% said they’re highly confident that their network is always compliant.

Network segmentation
Another topic that divided respondents is network segmentation. When respondents were asked if they believe that network security and segmentation are properly implemented throughout their company’s network, 59% said yes and 41% said no.

The full survey is available here.

“Our goal with this survey was to capture how network professionals balanced increased network complexity and required changes with network uptime, availability, security and compliance requirements,” said James Brear, president and CEO of Veriflow, in a statement. “It’s clear that many organizations settle for suboptimal network management solutions, thus costing them hours to report and resolve network issues. This problem highlights the importance of having a solution that predicts the impacts of network changes, identifies outages and vulnerabilities, and accelerates resolution time of network issues.”

This story, "Top reasons for network downtime" was originally published by Network World.

To comment on this article and other CIO content, visit us on Facebook, LinkedIn or Twitter.
Download the CIO October 2016 Digital Magazine
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.