by By Thad Hunter

ITIL is a Jump Start, Not a Solution

How-To
Feb 16, 20079 mins
Business IT AlignmentIT LeadershipITIL

Change management reminds me of the arcade game Whac-A-Mole where you bat down an ever increasing supply of creatures that pop up. A large organization may install 300 changes a week and consume vast resources to do so, yet rarely turns its attention to more efficiently knocking down those rodents. We just swing harder and yell louder. But many improvements are possible. Change management can be and must be better executed before your organization can progress to the next level.

New methods and enabling technology have matured over the last year and deserve your consideration.

Change management is how you balance technical and business risk, not merely how you pump change requests through an administrative process. We can thank ITIL (IT Infrastructure Library) for renewing attention on our important, albeit unglamorous, operational processes. While ITIL offers a structured set of best practices or services, it is not a solution; however, it is a jump-start and a useful lever to generate awareness.

So what justifies re-engineering your current change process? Errors while inserting changes into production amplify throughout the enterprise with costly if not embarrassing consequences. Fill in your favorite episode here…we all have them. Changes and maintenance activities can account for 40 percent of IT people cost. Unfortunately most organizations fail to track these costs, which would seem to be a missed opportunity to justify budgets and the need for technology upgrades. Change events afford excellent learning opportunities for improvement projects. Visibility, everyone working off the same page and reliable configuration data all contribute to less rework and better business relationships.

The value proposition is real but redesigning change management is not trivial. Modifying a key operational process requires designing new workflows, defining new roles and responsibilities, collecting different data, managing service levels, acting on feedback from dashboards and integrating disparate databases. Afford your operational processes the same attention and formality that you afford a business application but also expect your IT staff to struggle when it comes to designing their own internal processes.

Before moving on to specific ideas, take note of the following prerequisite technologies, some of which are new and fairly complex:

  • CMDB (configuration management database)—a database application that organizes configuration items (CIs), their attributes and interrelationships to answer the fundamental question “what’s there now?”
  • CMDB end-user functionality that supports impact calculation, event tracking, dependency correlation among CIs, snapshots, versioning and ad hoc view creation.
  • Configuration management process supported by a formal role.
  • Discovery technology that’s extensible to continuously discover your particular environment, from wires through high-level business applications.
  • Process automation tools including workflow, forms (here the RFC, or request for change), roles and dashboard capabilities.
It’s All About Risk

Change management is risk management and proper categorization determines success. Categorizing risk calls on observable factors across multiple dimensions, not generic classifications like high, medium or low.

Examples of risk dimensions are type- and number-dependent CIs, prior change experience, business impact, technical complexity and resource effort.

Examples of factors for a technical complexity dimension could be major infrastructure redesign, critical shared software service, can’t predict impact, not feasible to test, outage expected, unknown compatibilities, capacity or performance degradation, new products/major updates, pre-approved change

Incidentally, the ITIL practice is to derive priority from urgency (amount of tolerable delay) and impact (mainly in a business context). These factors are necessary, but not sufficient to properly evaluate an RFC. A detailed and methodical technical risk assessment is also required.

Exception Management

The corollary to assessing risk is allocating time and attention according to risk. Better to focus all of the time reviewing one truly complex change than holding a perfunctory review of 20 changes. Different approaches can be used. Define pre-approved changes such as database scripts, DNS changes, certain Web content or rescheduling batch jobs. Pre-approved changes still follow a process but acknowledge that the Change Advisory Board (CAB) will add no value in these cases. Establish a continuous, virtual CAB, allowing members to vote on changes remotely and only meet when required. A variation on the virtual theme is to delegate changes to domain-specific mini-CABs, elevating only the highest risk changes to a management CAB.

Business Integration

Many words have exhorted us to focus on our customers. However, when it comes down to formally integrating business stakeholders into the process, many IT organizations hesitate and keep them at arm’s length. This is a missed opportunity to build credibility and to strengthen change testing. Business stakeholders should participate throughout the workflow, i.e., when the RFC is submitted, voting on the CAB, verifying the change during the promotion and grading the overall implementation. Business integration also occurs at the data level. The CMDB should associate configuration items to users, organizations, etc., in order to accurately predict the impact to the user base and service level agreements (SLAs).

Impact Analysis

Impact analysis assesses the type of dependency and how critical the correlation is between a target CI and its dependent or related CIs. Correlation rates the technical influence of a dependency along a spectrum of criticality. For example, will a dependency prevent another CI from loading or, worse, cause performance degradation or, worse yet, induce unstable performance, data corruption or overwrite key configuration files, etc. In addition to understanding the impact, the impact analysis should be repeated three times for a target CI: upon submission, again during readiness review to understand intervening changes and after promotion to verify the change insertion.

Collision Analysis

Collision analysis is similar to impact analysis but is performed to resolve contention among a group of changes that are planned for a particular timeframe. It uses dependency analysis to evaluate how all planned changes affect each other or additional codependent CIs. Collision analysis should also uncover overloaded resources and time constraints such as freeze periods and SLAs. After collisions are resolved, change or release packages can then be scheduled.

Windows into Learning

The best ITIL contribution to this topic is the post-implementation review (PIR). The change process is perpetually ripe for harvesting lessons learned. Every day small problems are solved and breakthroughs occur while testing and maintaining systems but these are difficult to capture. Taking time to honestly evaluate outcomes can yield a plethora of improvement projects, or uncover unresolved root causes that need to be routed to problem management, or highlight needed stability and performance monitoring. Experience should be incorporated so the CMDB also serves as a lessons learned repository. In terms of people management this step also offers concrete performance measures and rewards preventative thinking instead of reinforcing reactive behaviors.

Process Feedback Loops

Modern quality concepts emphasize that a good process incorporates measurable feedback and is self correcting. Management reporting and configuration events are two types of feedback that should be designed into a change process. Reporting includes tracking PIR actions and measuring process service level agreements. Configuration events include detecting unauthorized changes, comparing server configurations to a gold standard to reveal deltas that have crept in, development of “be careful when you touch this again” checklists, and filtering security and monitoring logs for indications of rogue changes.

Resource Management

Changes are a substantial part of overall demand. Anywhere from five to 20 individuals might participate during one change cycle. While not an ITIL concept, a good change process should include assigning and tracking resources and collecting actual time incurred. You will be surprised at the total cost and how useful it becomes justifying future budgets. Further, resource management is akin to project management. Since changes should be treated as mini-projects they also create opportunities to teach project management skills to staff when they serve as change leads.

Continuous Change

Changes are often bundled into huge weekend exercises. Without question some changes must occur outside of normal business hours, but this may also be hedging due to weak analysis skills and tools. Chronic abuse of the emergency process may come from overly strict change windows as much as poor planning. Pushing all your changes to the weekend can be inefficient and may actually introduce more dependencies causing a minor problem to cascade and waste an entire shift of work. A more efficient process would allow daily insertion of certain changes. The result can be increased end-user satisfaction with rapid closures, preservation of nightly batch maintenance windows, resource leveling, fewer 2:00 a.m. pages and more reliable monitoring and detection of promotion anomalies due to fewer moving parts.

What Are We Doing Again?

Finally, most organizations have never precisely specified which CIs are under formal change control and fewer ever revisit what should be controlled. A typical policy goes something like, “Anything in production that’s not Web content goes through change control.” Yet every day vendor patches, JVM updates, system monitors/alerts, blades, specs, OS fixes and database tables are changed outside of the process. If you accepted the earlier notion that change is about risk, then CIs should be risk rated, which allows some to be excluded from change control (but never configuration control). Reassessing CMDB content is also about incorporating new CI types (business rules in an integration engine, batch programs, SANs are type examples) and more granular CIs every day to respond to changing risk and lessons learned.

Hopefully these examples illustrate that your change process is worth revisiting. Handling change is a core competency that must be mastered as an organization. The reward can yield new efficiencies and additional capacity so you can spend more time solving business problems. And just to complete the Whac-A-Mole analogy: 1) the moles always win since you can’t deal with them all, and 2) the best score goes to the player who can adapt right when the tempo changes and hits with precision, not harder.

Thad Hunter has run and been helping clients improve IT operations as a CIO, management consultant and software product implementer for 12 years. He can be reached at thad.hunter@evergreensys.com.