For most of its existence, IT resilience has focused on uptime, making sure systems don\u2019t go down, and if they do, bringing them back online as quickly as possible.\n\nBut that is only part of the equation in this modern digital era. Today IT resilience means much more.\n\nConsider, for example, Brad Stone\u2019s take on it. As CIO for Booz Allen Hamilton, Stone says he thinks of resilience in two dimensions: One is about enabling the business without interruption; the second is about having the ability to adjust, deal with change, and handle the unexpected.\n\nMoreover, Stone says, resilience now means doing all that while continually delivering the experience users expect.\n\n\u201cTen years ago, if there was an outage, they\u2019d get past it. But users and business leaders today expect technology to always work and to be an amazing experience; the expectations are so much higher now because IT is such an enabler, it has taken on more importance,\u201d he says. \u201cUsers might not demand perfection but their standards are really, really high.\u201d\n\nThat in turn has prompted a more expansive approach to ensuring IT resilience today. Here experts and IT leaders offer seven best practices CIOs should take on to ensure they meet current expectations for resiliency.\n\n1. Align to business needs\n\nRon Brown, director of business resilience for GuidePoint Security, an advisory and services firm, defines IT resilience as making sure technology is always available \u2014 even as he acknowledges that such perfection isn\u2019t likely.\n\n\u201cYou do have to plan for the fact that things will go out at some point,\u201d he says.\n\nCIOs can best prepare for that inevitability by being clear on what systems matter most to the business; that clarity lets IT know what to focus on first during any sort of outage, he says.\n\n\u201cThe first thing you have to do without a doubt is be in alignment with the business, what they need and what they are willing to pay for [to get] what they expect,\u201d Brown says, noting that a business impact analysis can help IT and business get this alignment. \u201cAnd once you have that understanding of what the requirements are for the business, then it\u2019s about how do you map out the services and capabilities you have and which apps are used by which groups so if something goes wrong you know where to put your priorities to get them back up.\u201d\n\n2. Break down siloes\n\nRichard Caralli, a former CISO now working as a senior advisor for Axio Global, a cyberrisk management company, says he sees resilience as \u201can emergent property that extends from managing operational risk.\u201d\n\nTo do that well, IT operations and cybersecurity should be working with leaders overseeing business continuity\/disaster recovery planning. That, however, doesn\u2019t always happen, Caralli says.\n\n\u201cThese activities tend to be siloed such that each discipline operates on different risk assumptions and scenarios, when in fact they must converge and work collaboratively,\u201d he says.\n\nFor example, Caralli says an organization\u2019s cybersecurity team may be focusing on creating a stellar defense-in-depth strategy to best ensure it can prevent intrusions, detect them if they happen, and respond when they do. But the team may not be as strong in planning for getting \u201cback to normal operating conditions as quicky as possible with the least amount of consequences\u201d if cybersecurity isn\u2019t working closely with risk and IT, Caralli says.\n\n\u201cIf they\u2019re not all talking together, they might be planning or quantifying for different risks,\u201d he adds. \u201cThey have to plan and run scenarios together. If you look at risk from an impact side and can envision what kind of consequences might occur, you can start to quantify the risk and you can then know where to spend the next dollar, whether to put it on the prevention side or to spend on practices that will reduce the impact.\u201d\n\n3. Mature your metrics\n\nAs IT resilience has evolved, Jorge Machado, a partner at management consulting firm McKinsey & Co., says CIOs should adjust the metrics they use to measure and manage operations to ensure they\u2019re meeting the right objectives.\n\n\u201cTraditionally if we go back a decade it would be about uptime, availability of applications, and mean time to restore,\u201d Machado says. \u201cBut nowadays, as apps become more microservices-oriented and we move away from monolith systems, we need to measure in a more nuanced way.\u201d\n\nHe and colleague, McKinsey associate partner Arun Gundurao, suggest measurements focused on the ability to perform critical transactions such as those measuring failures in customer interactions, application experience from the user perspective, or service level objectives.\n\n\u201cIt\u2019s what does the business care about around this application or this customer journey,\u201d Gundurao says. \u201cYou want to measure what the business wants to measure.\u2019\n\n4. Practice\n\nIn Stone\u2019s opinion, resilience means successfully handling unexpected circumstances. And to do that, Stone makes sure his IT department isn\u2019t unprepared. That means training, testing, and practicing with table-top exercises and simulations.\n\n\u201cIt\u2019s running exercises, taking down a cluster and not telling [everyone] and seeing how people respond. It\u2019s almost like a live-fire simulation. You have to do that carefully, at the right time, but it has to be part of your cadence,\u201d he says. \u201cYou have to have those standard operating procedures, go through them and refine those. You have to be willing to make your staff uncomfortable, challenge them. It gives them some camaraderie because they know they can get through things.\u201d\n\nStone says such exercises give CIOs and their managers an opportunity to build confidence in processes that work well and build muscle memory, as well as identify weaknesses \u2014 such as a lack of redundancy in workers trained in key technologies or a lack of backup procedures should a particular application fail.\n\n5. Architect resiliency\n\nIT advisors stress that it\u2019s important to build resiliency into the architecture itself by, for example, distributing instances and payloads across geographical locations.\n\nOne way to ensure resilient systems is to \u201csimplify what you do so you can do it really well to meet expectations,\u201d Stone says, noting that such an approach also helps keep teams from getting overextended.\n\nMixing in automation for incident, problem, and change management also helps build resiliency, he adds.\n\nGundurao recommends adopting site reliability engineering (SRE), a set of principles and practices for infrastructure and operations aimed at creating scalable, reliable systems. SRE \u2014 and those trained in its principles \u2014 focuses on building IT not just to work well in blue skies but to work through stormy skies, Machado adds.\n\nAndrew Long, global enterprise architecture lead at Accenture, sees large traditional organizations increasingly adopting the principles, technologies, and methods used by digital-native organizations to architect more resilient IT systems. \u201cThis has enabled the business to improve its resilience to disruptive business events, and therefore become more competitive,\u201d he says.\n\nTo do so, IT leaders are emphasizing speed and agility, data centricity, and decentralization, as well as continuous integration and delivery, SRE, and microservices to deliver the business capabilities the future organization requires \u2026 in a more modular and composable way,\u201d Long says.\n\nThey are also shifting from traditional waterfall-based IT project delivery to \u201cmore product-centric IT delivery and operations, which tends to consider broader more strategic requirements that support IT resilience,\u201d he adds.\n\n\u201cAlmost all organizations have some part of the IT estate in the cloud,\u201d Long says, but the key is \u201cto consider what unique cloud capabilities can be leveraged to increase the organization\u2019s ability to become more agile and resilient.\u201d\n\n6. Stay vigilant\n\nOrganizational risks, business needs, and technology will all continue to evolve, so should practices around IT resiliency, experts say.\n\n\u201cEngage with the business to understand where they see the risks of business disruption, the scale of the risk, and crucially, how they quantify this risk and therefore the potential value,\u201d Long says. By having a clear understanding of the current state of your technology landscape, you can better understand how your organization can respond to this disruption, and where the critical risk areas reside.\n\n\u201cConfirm the specific interventions that need to be made to minimize the risk, and develop a roadmap to deliver change,\u201d Long says, adding that the execution of this roadmap is possible only \u201cif everyone is aligned on the business risk.\u201d\n\n7. Let business share in the accountability\n\nThe business side also has a role to play in IT resiliency, says Machado, so business unit leaders should have some accountability for it as well.\n\n\u201cI do think you have to have an accountability model, and we do think it should be shared with the business,\u201d he explains, \u201cso whoever builds the app should share responsibility for it. It should not just be the role of the CIO.\u201d\n\nMachado is not advocating for business units to take over IT operations and day-to-day management of apps and systems; rather, he says they should understand that their requirements and priorities can impact resiliency.\n\nFor example, if business unit leaders constantly prioritize time to market and speed to value creation, then they need to be share accountability for whether and by how much that could affect resilience.