If cost of ownership analysis is a painful exercise for IT organizations, why has almost every company done it (and continued to do it) multiple times?
Simply because management requires an accurate understanding of current IT costs and strengths so they can better assess new ideas and technologies.
In this article, we will identify six key elements to effective cost of ownership analysis, which you can use these to improve the accuracy and eliminate the
frustration associated with this necessary step in your IT evolution.
1) Analyze Platforms, Not Servers
First evaluate the current “platforms” within your environment, including all servers of all types in order to simplify the process. One of the most
difficult things to “get right” in an analysis of this type is an exact match between a given technology and the associated costs. The easiest way to do this is
not to limit the technology scope to a few machines or a single new application but to expand it to match all the technology in the IT budget. Limiting the
scope makes cost of acquisition simple to determine but it makes every other cost almost impossible to quantify without controversy.
A platform approach will result in development of a new “view” of the IT budget that is platform based. The advantage to this approach is that the
total in this view should match the total in the budget. This gives the study team tremendous leverage if discussions should wander to “I think this amount
is too high for platform A.” So if the amount is reduced for platform A, it must be raised for platform B. What does B think of that? This places the
entire cost discussion on solid footing — the IT budget — and allows the process to be managed dispassionately, a key to later acceptance
of the results.
2) Focus on a Representative Application and Include All the Pieces
Next, let’s consider a new business critical application or workload that requires platform selection. Once again, the key to success is to not limit the
view to a subset of components. By definition, a critical application will require careful design, careful sizing, careful maintenance, operation, support, and
disaster recoverability. It may also require a new or dedicated infrastructure, but at a minimum, it will tax existing infrastructure. Each of these components
and their associated costs, should be included in any cost of ownership comparison. The “view” developed in the previous step should facilitate this type
Over the past ten years, our group within IBM Lab Services has been doing IT Systems and Storage Optimization (“Scorpion”) Studies that focus on
this type of view and component based analysis. Our findings show that a typical ratio of production Web, application, and database servers to
“everything else” is about one to one. This means that any analysis that omits those other components for support, maintenance, and disaster recovery,
etc. may miss half of the real costs. This discrepancy grows for very large critical applications and is largely why our industry hasn’t done so well sizing
many new enterprise application suites. We’ve all heard the stories.
3) Consider Practical Capacity, Not Vendor Ratings
System capacity and performance can quickly become a very tedious and esoteric discussion, and in many cost of ownership efforts, it does.
Vendors feed this controversy to gain competitive advantage. This can be avoided. Our experience is that the most important aspect of performance
analysis within cost of ownership is not which vendor claim or benchmark is used as a base, but rather (a) what system utilizations are “normal” in your
current environment?; and (b) what is a reasonable expectation for the future?
Often, distributed server utilizations are very low and there is a good reason for it. An underutilized server requires no capacity planning. Most cost
analyses are considered part of a technology acquisition process so higher future state utilizations are assumed. If average server utilizations in your
environment are low, model a future state of 2 or 3 times the current for each component in the possible solution. No higher. Use any reasonable
performance metric – expected utilizations are far more important.
This is particularly true with the rise of virtualization, which is almost always
assumed in cost of ownership comparisons. Transitioning from a non-virtualized to virtualized server environment has some significant advantages including
higher potential utilization. It has a cost, however, capacity must be managed. Don’t assume world class utilization numbers unless you know what kind of
effort it will take to attain them.
IBM’s System z mainframe environments typically demonstrate this fact quite well. Mainframes usually run at very high utilizations around the clock.
They can do this because the level of internal standardization and automation is much higher than other platforms. Other platforms will eventually attain
these levels, but that is still years of vendor development away.
4) Don’t Ignore Labor Costs to Protect the Innocent
The most difficult topic within cost of ownership is undoubtedly the cost of labor. In a down economic cycle, most staff see nothing positive in
quantifying the cost of labor for “their” platform. High Full Time Equivalent (FTE) ratios have been an industry target for years and most IT professionals
can quote the current best practice and describe how they are exceeding it. Therein lies a problem. IT infrastructure support organizations have been
managing to these ratios for years using two basic strategies: (a) improve efficiency; or (b) push work onto other parts of the IT organization. The extent
to which strategy (b) is used differs by platform for a variety of reasons, but the result is the same. Any cost of ownership analysis that limits labor
calculations to IT infrastructure support headcount will likely miss major portions of the real support costs and skew the results.
A good solution to this problem is an approach similar to item one. Consider the entire IT organization and apportion every group that is not truly
platform neutral (and even individuals within an otherwise neutral group, like network support) to the appropriate platform labor category. The same kind
of “view” is now developed for assignment of labor with the underlying organization chart as the foundation.
The results will look quite different from industry published norms. They will be higher — up to 2 times or more on x86 platforms — and
will reflect true insight to cost. Because the resulting labor cost numbers cross organizational lines, no one group will feel responsible for them, or for
lowering them. Resistance to the process will be lessened and again, buy-in should be improved.
A side benefit to the process stems from the two strategies often used to manage FTE ratios — productivity improvement and narrowing of
responsibility. If the FTE ratios are changed significantly by the new “view” of the organization, the need for productivity tools will be evident. In the
years that we have been working with customers doing these studies, we’ve seen an alarming trend toward high complexity within distributed systems
— old hardware, old software, multiple releases of everything to be maintained — and a lack of investment in systems management
software. This is in stark contrast to the mainframe where software costs tend to be high while systems are maintained at strict currency levels, with the
result that staffing has been flat or dropping for years with steadily improving Quality of Service (QoS).
5) Quantify QoS in a Way that Makes Sense
QoS is an elusive topic since it has so many aspects that differ in importance between companies, but some general trends can give guidance. In this
age of real-time systems, disaster recovery has become a universal need. Two key metrics in disaster recovery are Recovery Time Objective (RTO – the
time to bring alternative systems online for use) and Recovery Point Objective (RPO – the age of the data on those recovered systems). If we consider
the dominant RTO and RPO for a given platform, we gain insight into both cost and QoS. Though any system can be made disaster recoverable, there is
a huge cost differential between making a single mainframe recoverable and making 1,000 distributed systems recoverable. The majority of customers
we’ve worked with have done the former and not the latter because of the cost. This can be quantified very easily with a call to a recovery services
provider and should be included in the platform cost comparison.
Cloud computing and other metered services will certainly offer a recoverable
option and here lies an opportunity. As IT will have to compete with public clouds the above cost analysis can be used to set internal cost guidelines for a
corporate cloud infrastructure very early in the cloud development process. Or it can be used to steer workload onto the platform that is already
recoverable, thus eliminating some of the need to develop a recovery capability where it currently does not exist.
6) Look at Costs Incrementally — Plot Your Own Course
The last topic to consider is primarily financial. There is a “sunk cost” and an “incremental cost” associated with IT infrastructure that must be
considered. Just as the first chip to roll off a fabrication line is worth billions and the second worth pennies, the first workload for a platform is far more
expensive to provision than subsequent ones. This is especially true for the mainframe since the technology may be physically refreshed, but financially the
transaction is handled as an upgrade. This is unlike the distributed world where technology and book value are tied together.
IBM has taken this concept a step further with the arrival of mainframe “specialty” engines that have much lower price points and drastically reduced
impact on software costs. However, they cannot run alone, they must be added to an existing system. It is not unusual for mainframe systems in
production to cost $4,000/MIP while specialty engine upgrades may run only $200/MIP. The incremental costs on the mainframe in this case are 1/20th
of current cost. These kinds of dramatic differences must be considered in cost of ownership and are often large enough to justify a change in course for
IT. Exploiting these areas of low incremental cost to support growth can significantly improve the overall cost of IT. Virtualization is expected to have a
significant effect on other platforms, so the need is universal and growing.
With 33 years at IBM, John Schlosser is currently a Senior Managing Consultant for the Scorpion practice within IBM Systems and
Technology Group — Lab Services & Training. He is a founding member of the group which was started in 1999. He has developed and
modified many of the methodologies the team uses for IT infrastructure cost analysis.