by John Schlosser - Executive Consultant with IBM's STG Lab Services & Training

Six Easy Pieces: How to Do Cost of Ownership Analysis Better

Jan 26, 2010
BudgetingIT Leadership

The proper approach to cost of ownership analysis can significantly help IT organizations understand their needs, leverage their strengths, and position for future growth. These six techniques should be added to any CIOs portfolio of methods to serve the needs of the business.

If cost of ownership analysis is a painful exercise for IT organizations, why has almost every company done it (and continued to do it) multiple times? Simply because management requires an accurate understanding of current IT costs and strengths so they can better assess new ideas and technologies. In this article, we will identify six key elements to effective cost of ownership analysis, which you can use these to improve the accuracy and eliminate the frustration associated with this necessary step in your IT evolution.

1) Analyze Platforms, Not Servers

First evaluate the current “platforms” within your environment, including all servers of all types in order to simplify the process. One of the most difficult things to “get right” in an analysis of this type is an exact match between a given technology and the associated costs. The easiest way to do this is not to limit the technology scope to a few machines or a single new application but to expand it to match all the technology in the IT budget. Limiting the scope makes cost of acquisition simple to determine but it makes every other cost almost impossible to quantify without controversy.

A platform approach will result in development of a new “view” of the IT budget that is platform based. The advantage to this approach is that the total in this view should match the total in the budget. This gives the study team tremendous leverage if discussions should wander to “I think this amount is too high for platform A.” So if the amount is reduced for platform A, it must be raised for platform B. What does B think of that? This places the entire cost discussion on solid footing — the IT budget — and allows the process to be managed dispassionately, a key to later acceptance of the results.

2) Focus on a Representative Application and Include All the Pieces

Next, let’s consider a new business critical application or workload that requires platform selection. Once again, the key to success is to not limit the view to a subset of components. By definition, a critical application will require careful design, careful sizing, careful maintenance, operation, support, and disaster recoverability. It may also require a new or dedicated infrastructure, but at a minimum, it will tax existing infrastructure. Each of these components and their associated costs, should be included in any cost of ownership comparison. The “view” developed in the previous step should facilitate this type of analysis.

Over the past ten years, our group within IBM Lab Services has been doing IT Systems and Storage Optimization (“Scorpion”) Studies that focus on this type of view and component based analysis. Our findings show that a typical ratio of production Web, application, and database servers to “everything else” is about one to one. This means that any analysis that omits those other components for support, maintenance, and disaster recovery, etc. may miss half of the real costs. This discrepancy grows for very large critical applications and is largely why our industry hasn’t done so well sizing many new enterprise application suites. We’ve all heard the stories.

3) Consider Practical Capacity, Not Vendor Ratings

System capacity and performance can quickly become a very tedious and esoteric discussion, and in many cost of ownership efforts, it does. Vendors feed this controversy to gain competitive advantage. This can be avoided. Our experience is that the most important aspect of performance analysis within cost of ownership is not which vendor claim or benchmark is used as a base, but rather (a) what system utilizations are “normal” in your current environment?; and (b) what is a reasonable expectation for the future?

Often, distributed server utilizations are very low and there is a good reason for it. An underutilized server requires no capacity planning. Most cost analyses are considered part of a technology acquisition process so higher future state utilizations are assumed. If average server utilizations in your environment are low, model a future state of 2 or 3 times the current for each component in the possible solution. No higher. Use any reasonable performance metric – expected utilizations are far more important.

This is particularly true with the rise of virtualization, which is almost always assumed in cost of ownership comparisons. Transitioning from a non-virtualized to virtualized server environment has some significant advantages including higher potential utilization. It has a cost, however, capacity must be managed. Don’t assume world class utilization numbers unless you know what kind of effort it will take to attain them.

IBM’s System z mainframe environments typically demonstrate this fact quite well. Mainframes usually run at very high utilizations around the clock. They can do this because the level of internal standardization and automation is much higher than other platforms. Other platforms will eventually attain these levels, but that is still years of vendor development away.

4) Don’t Ignore Labor Costs to Protect the Innocent

The most difficult topic within cost of ownership is undoubtedly the cost of labor. In a down economic cycle, most staff see nothing positive in quantifying the cost of labor for “their” platform. High Full Time Equivalent (FTE) ratios have been an industry target for years and most IT professionals can quote the current best practice and describe how they are exceeding it. Therein lies a problem. IT infrastructure support organizations have been managing to these ratios for years using two basic strategies: (a) improve efficiency; or (b) push work onto other parts of the IT organization. The extent to which strategy (b) is used differs by platform for a variety of reasons, but the result is the same. Any cost of ownership analysis that limits labor calculations to IT infrastructure support headcount will likely miss major portions of the real support costs and skew the results.

A good solution to this problem is an approach similar to item one. Consider the entire IT organization and apportion every group that is not truly platform neutral (and even individuals within an otherwise neutral group, like network support) to the appropriate platform labor category. The same kind of “view” is now developed for assignment of labor with the underlying organization chart as the foundation.

The results will look quite different from industry published norms. They will be higher — up to 2 times or more on x86 platforms — and will reflect true insight to cost. Because the resulting labor cost numbers cross organizational lines, no one group will feel responsible for them, or for lowering them. Resistance to the process will be lessened and again, buy-in should be improved.

A side benefit to the process stems from the two strategies often used to manage FTE ratios — productivity improvement and narrowing of responsibility. If the FTE ratios are changed significantly by the new “view” of the organization, the need for productivity tools will be evident. In the years that we have been working with customers doing these studies, we’ve seen an alarming trend toward high complexity within distributed systems — old hardware, old software, multiple releases of everything to be maintained — and a lack of investment in systems management software. This is in stark contrast to the mainframe where software costs tend to be high while systems are maintained at strict currency levels, with the result that staffing has been flat or dropping for years with steadily improving Quality of Service (QoS).

5) Quantify QoS in a Way that Makes Sense

QoS is an elusive topic since it has so many aspects that differ in importance between companies, but some general trends can give guidance. In this age of real-time systems, disaster recovery has become a universal need. Two key metrics in disaster recovery are Recovery Time Objective (RTO – the time to bring alternative systems online for use) and Recovery Point Objective (RPO – the age of the data on those recovered systems). If we consider the dominant RTO and RPO for a given platform, we gain insight into both cost and QoS. Though any system can be made disaster recoverable, there is a huge cost differential between making a single mainframe recoverable and making 1,000 distributed systems recoverable. The majority of customers we’ve worked with have done the former and not the latter because of the cost. This can be quantified very easily with a call to a recovery services provider and should be included in the platform cost comparison.

Cloud computing and other metered services will certainly offer a recoverable option and here lies an opportunity. As IT will have to compete with public clouds the above cost analysis can be used to set internal cost guidelines for a corporate cloud infrastructure very early in the cloud development process. Or it can be used to steer workload onto the platform that is already recoverable, thus eliminating some of the need to develop a recovery capability where it currently does not exist.

6) Look at Costs Incrementally — Plot Your Own Course

The last topic to consider is primarily financial. There is a “sunk cost” and an “incremental cost” associated with IT infrastructure that must be considered. Just as the first chip to roll off a fabrication line is worth billions and the second worth pennies, the first workload for a platform is far more expensive to provision than subsequent ones. This is especially true for the mainframe since the technology may be physically refreshed, but financially the transaction is handled as an upgrade. This is unlike the distributed world where technology and book value are tied together.

IBM has taken this concept a step further with the arrival of mainframe “specialty” engines that have much lower price points and drastically reduced impact on software costs. However, they cannot run alone, they must be added to an existing system. It is not unusual for mainframe systems in production to cost $4,000/MIP while specialty engine upgrades may run only $200/MIP. The incremental costs on the mainframe in this case are 1/20th of current cost. These kinds of dramatic differences must be considered in cost of ownership and are often large enough to justify a change in course for IT. Exploiting these areas of low incremental cost to support growth can significantly improve the overall cost of IT. Virtualization is expected to have a significant effect on other platforms, so the need is universal and growing.

With 33 years at IBM, John Schlosser is currently a Senior Managing Consultant for the Scorpion practice within IBM Systems and Technology Group — Lab Services & Training. He is a founding member of the group which was started in 1999. He has developed and modified many of the methodologies the team uses for IT infrastructure cost analysis.