SLA Definitions and Solutions

SLAs are a critical component of any vendor contract. Beyond listing expectations of service type and quality, an SLA provides remedies when requirements aren't met.

Editor's Note: This article was updated June 18, 2009.


What is an SLA?

A service-level agreement (SLA) is simply a document describing the level of service expected by a customer from a supplier, laying out the metrics by which that service is measured, and the remedies or penalties, if any, should the agreed-upon levels not be achieved. Usually, SLAs are between companies and external suppliers, but they may also be between two departments within a company.

A telecom company's SLA, for example, may promise network availability of 99.999 percent (for the mathematically disinclined, that works out to about five and a quarter minutes of downtime per year, which, believe it or not, can still be too long for some businesses), and allow the customer to reduce their payment by a given percentage if that is not achieved, usually on a sliding scale based on the magnitude of the breach.


Why Do I Need SLAs

An SLA pulls together information on all of the contracted services and their agreed-upon expected reliability into a single document. It clearly states metrics, responsibilities and expectations so in the event of issues with the service, neither party can plead ignorance. It ensures both sides have the same understanding of requirements.

Any significant contract without an associated SLA (reviewed by legal counsel) is open to deliberate or inadvertent misinterpretation. The SLA protects both parties in the agreement.


Who Provides the SLA?

Most service providers have standard SLAs — sometimes several, reflecting various levels of service at different prices — that can be a good starting point for negotiation. These should be reviewed and modified by the customer's legal counsel, since they are usually slanted in favor of the supplier.

When sending out an RFP, the customer should include expected service levels as part of the request; this will affect supplier offerings and pricing and may even influence the supplier's decision to respond. For example, if you demand 99.999 percent availability for a system, and the supplier is unable to accommodate this requirement with your specified design, it may propose a different, more robust solution.


What's in an SLA?

The SLA should not only include a description of the services to be provided and their expected service levels, but also metrics by which the services are measured, the duties and responsibilities of each party, and the remedies and/or penalties for breach.

Metrics should be designed so bad behavior by either party is not rewarded. For example, if a service level is breached because the client did not provide information in a timely manner, the supplier should not be penalized.


What Are Key Components of an SLA?

The SLA should include components in two areas: services and management.

Service elements include specifics of services provided (and what's excluded, if there's room for doubt), conditions of service availability, standards such as time window for each level of service (prime time and non-prime time may have different service levels, for example), responsibilities of each party, escalation procedures, and cost/service tradeoffs.

Management elements should include definitions of measurement standards and methods, reporting process, contents and frequency, a dispute resolution process, an indemnification clause protecting the customer from third-party litigation resulting from service level breaches (this should already be covered in the contract, however), and a mechanism for updating the agreement as required.

This last item is critical; service requirements and vendor capabilities change, so there must be a way to make sure the SLA is kept up-to-date.


What about indemnification?

The SLA should include a provision in which the service provider agrees to indemnify the customer company for any breaches of its warranties. Indemnification means that the provider will have to pay the customer for any third-party litigation costs resulting from its breach of the warranties. If you use a standard SLA provided by the service provider, it is likely this provision will be absent; ask your in-house counsel to draft a simple provision to include it, although the service provider may want further negotiation of this point.


Is an SLA Transferable?

Should the service provider be acquired by, or merge with another company, the customer may expect that its SLA will continue to be in force, but this may not be the fact. The agreement may have to be renegotiated. Make no assumptions, however bear in mind that the new owner will not want to alienate existing customers, so may decide to honor existing SLAs.


How Can I Verify Service Levels?

Most service providers make statistics available, often on a Web portal. There, customers can check whether SLAs are being met, and whether they're entitled to service credits or other penalties as laid out in the SLA.

However, for mission-critical services where the business itself is at risk if service levels are not met, it may be worth considering using a third-party monitoring organization or an SLA management tool to supplement the vendor's data. The extra expense of these additional methods can be worthwhile for critical services.


What Kind of Metrics Should be Monitored?

Many items can be monitored as part of an SLA, but the scheme should be kept as simple as possible to avoid confusion and excessive cost on either side. In choosing metrics, examine your operation and decide what is most important. The more complex the monitoring (and associated remedy) scheme, the less likely it is to be effective, since no-one will have time to properly analyze the data. When in doubt, opt for ease of collection of metric data; automated systems are best, since it is unlikely that costly manual collection of metrics will be reliable.

Depending on the service, the types of metric to monitor may include:

Service availability: the amount of time the service is available for use. This may be measured by time slot, with, for example, 99.5 percent availability required between the hours of 8 am and 6 pm, and more or less availability specified during other times. E-commerce operations typically have extremely aggressive SLAs at all times; 99.999 percent uptime is a not uncommon requirement for a site that generates millions of dollars an hour.

Defect rates: Counts or percentages of errors in major deliverables. Production failures such as incomplete backups and restores, coding errors/rework, and missed deadlines may be included in this category.

Technical quality: in outsourced application development, measurement of technical quality by commercial analysis tools that examine factors such as program size and coding defects.

Security: In these hyper-regulated times, application and network security breaches can be costly. Measuring controllable security measures such as anti-virus updates and patching is key in proving all reasonable preventive measures were taken, in the event of an incident.


What should I consider when selecting metrics for my SLA?

Choose measurements that motivate the right behavior. The first goal of any metric is to motivate the appropriate behavior on behalf of the client and the service provider. Each side of the relationship will attempt to optimize its actions to meet the performance objectives defined by the metrics. First, focus on the behavior that you want to motivate. Then, test your metrics by putting yourself in the place of the other side. How would you optimize your performance? Does that optimization support the originally desired results?

Ensure that metrics reflect factors within the service provider's control. To motivate the right behavior, SLA metrics have to reflect factors within the outsourcer's control. A typical mistake is to penalize the service provider for delays caused by the client's lack of performance. For example, if the client provides change specifications for application code several weeks late, it is unfair and demotivating to hold the service provider to a prespecified delivery date. Making the SLA two-sided by measuring the client's performance on mutually dependent actions is a good way to focus on the intended results.

Choose measurements that are easily collected. Balance the power of a desired metric against its ease of collection. Ideally, the SLA metrics will be captured automatically, in the background, with minimal overhead, but this objective may not be possible for all desired metrics. When in doubt, compromise in favor of easy collection; no one is going to invest the effort to collect metrics manually.

Less is more. Despite the temptation to control as many factors as possible, avoid choosing an excessive number of metrics or metrics that produce a voluminous amount of data that no one will have time to analyze.

Set a proper baseline. Defining the right metrics is only half of the battle. To be useful, the metrics must be set to reasonable, attainable performance levels. Unless strong historical measurement data is available, be prepared to revisit and readjust the settings at a future date through a predefined process specified in the SLA.


What Uptime Provisions are Typical for Network Service Providers?

Hosted network services offer various levels of uptime guarantees, at escalating prices. The customer should expect to pay less for 99 percent availability (which allows for over 7 hours of unplanned downtime per month) than for 99.9 percent (43.8 minutes per month) or 99.99 percent (4.4 minutes per month). For mission-critical applications, providers will offer near 100 percent availability, but it will be more expensive.

The operative word here is unplanned; service providers will have predetermined windows for network maintenance, although network redundancy should prevent customer outages.


When Should We Review our SLAs?

As businesses change, so do its service requirements. An SLA should not be viewed as a static document, and should be reviewed periodically, specifically if:

• The client's business needs have changed (for example, establishing an e-commerce site increases availability requirements).

• The technical environment has changed (for example, more reliable equipment makes a higher availability guarantee possible).

• Workloads have changed.

• Metrics, measurement tools and processes have improved.

The SLA is a critical part of any supplier agreement, and it will pay off in the long-term if the SLA is properly thought-out and codified at the beginning of a relationship. It protects both parties, and, should dispute arise, will specify remedies and avoid misunderstandings. That can save considerable time and money for both customer and supplier.

