On July 8, 2022, a botched maintenance update on the Rogers ISP network in Canada crashed internet access across the country for at least 12 hours, with some customers experiencing problems for days afterward.\n\nThe impact was profound. The nationwide outage affected phone and internet service for about 12.2 million customers \u2013 about 25% of Canada\u2019s internet capacity \u2013 halting point-of-sale debit payments on the Interac network, preventing Rogers mobile phone users from accessing 9-1-1 services, disrupting transit services dependent on online payment, and even wreaking havoc on traffic signals in Toronto dependent on cellular GSM for timing changes.\n\nAdding insult to injury, the outage even forced Canadian musician The Weeknd to postpone the first stop on his world tour at Toronto\u2019s Rogers Centre. \n\nThe cause? As was subsequently revealed in Rogers\u2019 submission to regulator Canadian Radio-television and Telecommunications Commission, the update \u201cdeleted a routing filter and allowed for all possible routes to the Internet to pass through the routers. ... Certain network routing equipment became flooded, exceeded their capacity levels, and was then unable to route traffic, causing the common core network to stop processing traffic.\u201d\n\nAlthough Rogers \u2013 one of Canada\u2019s major internet, broadcasting, and mobile wireless companies \u2013 restored service to most customers within a day, the catastrophic loss of service startled Canadian businesses. Some, like the approximately 100 outlets operated by farm and agriculture supply retailer Peavey Mart, had redundant access to other internet providers already in place.\n\nAs a result, \u201conly two stores were directly impacted where they had no internet connectivity,\u201d says Shaun Guthrie, the company\u2019s Senior VP of Information Technology and VP of the CIO Association of Canada.\n\n\u201cHowever, we rely on Interac services for our customers to transact, which relies solely on Rogers, so we lost the ability to do debit card payments.\u201d\n\nNot just a domestic issue\n\n\u201cSome of the non-profits that I serve lost the ability to record meeting the needs of vulnerable people for a day or two,\u201d says Helen Knight, Virtual CIO and Strategic Technology Consultant for Canadian non-profits. \u201cPersonally, my children and I had no way to communicate. My 13-year-old daughter was out until 10 p.m. and I was worried she had no way to get home.\u201d \n\nOthers were not so fortunate. \u201cAs a global company producing waterslides and water park attractions, the Rogers network outage did affect us more than we originally thought,\u201d says Chris Palsenbarg, Manager of IT Operations and Help Desk Support with WhiteWater West Industries. \u201cStaff travelling overseas couldn\u2019t even use their phones.\u201d\n\nSapper Labs Group is a Canadian cybersecurity\/cyberintelligence firm. \u201cAlthough our company was not affected by the Rogers outage, many of our partners, clients, and competitors were,\u201d says Dave McMahon, Sapper Labs\u2019 Chief Intelligence Officer. \u201cSome organizations have yet to fully recover. This has had a ripple effect through the market.\u201d\n\nIn the wake of the Rogers outage, Canadian CIOs and IT executives and experts are reviewing their readiness to cope with such failures in the future. Their conclusions are worth noting by CIOs everywhere, all of whom are at risk of encountering similar service outages in their own countries, whether from system issues, intrusions, or power failure due to environmental or other causes.\n\nBuild redundancy\n\nThe Rogers outage underlined the value of having redundant ISP access, even though doing costs more than relying on just one. Although some corporations balk at this extra expense, Peavey Mart accepts the value of paying for redundant internet access wherever possible. The company was rewarded for its farsightedness on July 8, 2022.\n\nThe failure of the Rogers ISP network didn\u2019t blindside the company either, because \u201cwe proactively monitor the state of our data communications,\u201d Guthrie says. \u201cAs a result, once the stores were impacted by the outage, they automatically failed over to their secondary ISPs through our SD-WAN enabled infrastructure.\u201d\n\nNon-profit organizations such as Canada\u2019s Salvation Army can\u2019t afford the kind of infrastructure used by Peavey Mart. But their CIOs are determined experts accustomed to \u201caccomplishing amazing feats using free software and donated hardware,\u201d says Knight. \u201cThey are accustomed to their aged IT infrastructure failing, so they usually have a manual process to fall back on,\u201d she says.\n\nAs a result, Canadian non-profit CIOs can cope with ISP failures, at least at the time they actually occur. \u201cThe lost data from the outage will impact them later, when they don\u2019t have correct records showing how many people they served to show their donors, potentially impacting future grants,\u201d Knight says. \n\nThis being the case, Knight believes the Rogers outage could change non-profit attitudes to redundant ISP access for the better. \u201cAfter all, it has been common practice for years to have a redundant connection for all critical business components, so the silver lining is that now non-profits understand a new risk area they may not have considered,\u201d she says.\n\n\u201cSo if this is the incident that allows non-profits to recognize the need to have a senior technology leader at the decision-making table, aligning their strategic plans to their technical roadmap, then this might well be the cheapest and easiest way to learn that lesson. It is much better than facing a cyber breach!\u201d\n\nCheck your suppliers\u2019 backup plans\n\nFor Sapper Labs, \u201cthe Rogers outage reinforced our confidence in our own architecture and mode of operation,\u201d McMahon says. But this sense of confidence reinforced the point that a company\u2019s IT infrastructure doesn\u2019t exist in isolation. Instead, it is one link in a chain of ISPs, cloud platforms and others who connect to the enterprise via the internet.\n\nThus, \u201cthe takeaway from the Rogers outage is to ensure that one\u2019s supply chain, partners and clients are equally prepared and that there are contingencies in place to assist them in maintaining business operations,\u201d he says. \u201cWhat was enlightening was that the outage immediately revealed who was a Rogers customer, whether they have alternate means of communications, their level of cybersecurity maturity, and critical interdependencies across the ecosystem.\u201d\n\nPeavey Mart is equally diligent about checking for vulnerabilities in its data supply chain. \u201cWe ask all our cloud providers; do they have redundancy?\u201d says Guthrie. \u201cDo their systems have failovers to backup systems built in, and do they have things like business continuity plans in place so that when a failure occurs, their people know what to do? And we ask those questions up front.\u201d\n\nUnfortunately, retailers like Peavey Mart don\u2019t have the clout to demand such answers from Canadian interbank megacorps like Interac. \u201cAs a result, we have no choice but to assume that Interac has such backup measures in place, which they clearly did not,\u201d he says.\n\nExpect more ISP failures\n\nThe resolution of the Rogers outage in Canada was followed by government investigations, negative media reports, and lots of predictable public outrage. But none of these reactions will be able to change a very simple fact: ISP networks are complex and vast systems made of many parts whose response to maintenance upgrades cannot be completely modeled in simulations.\n\nAs a result, even after all the improvements Rogers has promised to make and that other Canadian ISPs might copy out of a sense of prudence, \u201cI have no doubt that we'll probably see additional failures,\u201d says Guthrie. \u201cI don't know who it will be, but I think we will likely see an additional failure within a year.\u201d\n\nThis being the case, CIOs whose companies rely on ISP access need to take steps now to protect their enterprises against such outages. According to Dave McMahon, the path forward is clear: \u201cDual providers and redundant independent systems are best practices in industry,\u201d he says.\n\n\u201cIt is the very definition of a high-availability system. This is why all Sapper Labs employees already have multiple means of secure communications and abilities to collaborate online. We are currently assessing how best we can extend similar secure high-assurance solutions to our clients and partners.\u201d\n\nAt the same time, CIOs need to remain humble and not overestimate their ability to plan for such events beforehand.\n\n\u201cTechnology is so ubiquitous and so complex, with every person and every organization experiencing new and complex technical challenges over the last couple years, that although it is possible to protect companies against Rogers-style outages it isn\u2019t possible or cost-effective to protect against all risk,\u201d says Knight. \u201cInstead, it is a matter of quantifying the impact and urgency of each risk and prioritizing organizational continuity plans for the most critical operational areas.\u201d\n\nThe bottom line: A Rogers-style ISP outage is a crisis that can and likely will confront CIOs in companies around the world in the years to come. This is why boosting redundant systems and preparing contingency plans now is a must, to minimize and mitigate the inevitable impact of these communication failures on the enterprise.