by Stephanie Balaouras

Four Best Practices For IT Availability And Service Continuity Management

Dec 16, 2009
Data CenterDisaster Recovery

You need to have a framework in place to manage disaster recovery preparedness as a continuous process, not a one-time event, says Forrester's Stephanie Balaouras. Here are four practical pieces of advice.

Forrester often gets inquiries such as, “What requirements should we keep in mind while developing our disaster recovery plans and documents?” and, “Which strategies work best for managing our disaster recovery program once it’s in place?”

Business Continuity and Disaster Recovery Planning Definition and Solutions

Technology supports disaster recovery preparedness, but it doesn’t constitute a strategy or plan. You need to have a framework in place to manage disaster recovery preparedness as a continuous process, not a one-time event. Processes have to be in place to ensure that disaster recovery plans are continuously updated as a part of change and configuration management and are regularly tested. In addition, it’s important to periodically update the business impact analysis (BIA) and risk assessments (RAs) that provide the key inputs into the development of your disaster recovery strategy and specific plans. By taking a proactive approach to disaster recovery, rather than being unprepared when a disaster occurs, you will save your company substantial money in the long run. Organizations that take this more proactive, more holistic approach, often use the term IT service continuity rather than “disaster recovery.”

However, as companies become increasingly dependent on IT for day-to-day business operations, business owners demand greater levels of IT availability, sometimes at 99.95% or better. This has forced IT operations teams to revisit their strategies for both local high availability and IT service continuity. So, technology decisions play a vital role in supporting your overall strategy.

Forrester sees Infrastructure & Operations (I&O) professionals evaluating technologies and services such as:

1. Local and long-distance clustering for zero downtime.

2. Server virtualization high availability and fault-tolerant technology for near-zero downtime at the primary site as well as rapid restart of virtual machines at the recovery site.

3. Local snapshots and remote replication technology for near-zero data loss.

The “how to” of IT availability and service continuity is not the only challenge. If money were no object, I&O professionals could implement solutions that would enable zero downtime and zero data loss for all their IT systems.

But the pressure to maintain or reduce IT costs means that they must justify the investment in availability technologies by categorizing IT systems in terms of their criticality and implement the most cost-effective solutions to achieve agreed-upon recovery objectives or service-level agreements (SLAs). Determining the criticality of IT systems and writing meaningful, achievable objectives or SLAs with business owners are often far more challenging than the implementation of the technology itself.

In recent research for its Infrastructure & Operations Council, Forrester uncovered four best practices:

1) Classify systems for criticality. Whether you are developing a strategy for operational high availability or IT service continuity, determining criticality requires that you perform a BIA. For each business process, you must map dependent IT systems, calculate the cost of downtime, and determine availability rates and recovery objectives. You must also determine the probability of certain types of risks from IT failures to human error.

Selling management on business metrics such as, “The business demands that we provide less than 4-hour recovery of our customer care system with less than a minute loss in transactional data,” is much more compelling to an executive than, “We need $3.2 million for hardware and $300,000 per year in telecommunications expenses for a data replication solution.” This is why conducting the BIA is so important and why IT can’t just start with technology.

2) Develop tiers of service for both availability and IT service continuity. To reach the next level of maturity, IT professionals must shift their thinking from disaster recovery to IT service continuity. IT service continuity is less a reactive response to catastrophic events and more focus on the nearly continuous availability of IT services. Once your range of recovery objectives is determined, it often helps to develop an IT availability and service continuity catalog. The catalog is a range of service tiers. Each service tier has associated availability rate, recovery objectives, the technology prerequisites, and the cost to deliver the service. This catalog helps you simplify your strategy, quickly assign new IT systems to a service tier, and communicate with the business.

3) Measure availability from the end-user perspective. Well-written objectives must measure unplanned and planned downtime. They must take into account timing of the downtime (e.g., end of month, quarterly close, and peak sales periods), and they must measure downtime from the perspective of the user. This means that you must measure the availability of the end to end IT service, not just the individual infrastructure components such as clients, server, storage and networks.

4) Include availability and continuity considerations in application development and testing. Too often, availability and continuity are considered after an application has already been deployed. At this point, the choice of server, storage, and network infrastructure and the application processing and logic will limit certain availability and continuity options. Resiliency has to be a part of application development, infrastructure selection, and acceptance testing.

The cardinal mistake when developing IT service continuity strategies and justifying investments is to lead with technology. It might seem burdensome and complicated to conduct a business impact analysis and risk assessment with a cross-function team of business owners, risk management professionals, facilities, and IT, but it’s critical; with the results you can identify business requirements, risks, and impacts to create quantitative justifications for investment and get the entire business onboard.

Stephanie Balaouras is a Principal Analyst at Forrester Research, where she works closely with its Infrastructure & Operations Council, which is part of the Forrester Leadership Boards. For more information and to download related research, please visit recovery.