Understanding Service Level Agreements for Database Development
This excerpt from the upcoming book, SQL Server 2008 Administration in Action, introduces the role of service-level agreements (SLAs) as part of a DBA's overall strategy to maintaining reliable Microsoft SQL Server networks.
Wed, November 05, 2008
CIO — As a DBA, your job is to provide database systems that are reliable, secure and available when required. In order to achieve these goals, you must understand your customer's service requirements, usually conveyed through Service Level Agreements.
Far too many DBAs have no idea what their Service Level Agreements are, or if it's possible to meet them. Good DBAs know what they are, obtain budget to design and implement systems to meet them, and ensure they're met through Disaster Recovery planning and testing.
Let's take a look at what's typically included in Service Level Agreements, understand the ramifications of an availability target, and how we can ensure the SLAs are achievable.
Typical Service Level Agreements
Whilst every organization will have different Service Level Agreements, most fall into similar groups or categories. Common ones for SQL Server systems are as follows:
System Availability. e.g. The database servers must be available 7 days a week, from 6am to midnight
Acceptable Data Loss. e.g. No more than 15 minutes of data entry can be lost
Recovery Time. e.g. In the event of a disaster, the systems should be back up and running within one hour.
Performance. e.g. Transaction response time should not exceed 2 seconds.
Specific statements such as those presented above are required in order to design systems and processes to meet expectations. What's important at this point is understanding the need to make design and infrastructure decisions in the context of meeting Service Level Agreements.
How Many Nines?
System Availability, a common SLA item, is often measured in terms of the amount of 9's in the availability percentage target, with "Five Nines" representing 99.999% system availability. This translates to a 5 minute maintenance window, or allowed downtime, per year, or 0.8 seconds per day! In contrast, a 99% target represents 1.7 hours per week. Removal of each "nine" from the uptime target significantly reduces the cost of building an environment that meets the target, as Table 1 helps demonstrate.
Table 1: How Many 9s Do You Need?
This table shows the sharp decrease in downtime for each additional 9 in the availability target. 99% availability allows for about 3.5 days downtime per year. 99.999% allows 5 minutes!
|Availability Target||Downtime Per Year (Approx.)|
|90 percent||36 days|
|99 percent||3.5 days|
|99.9 percent||8 hours|
|99.99 percent||52 minutes|
|99.999 percent||5 minutes|
Perhaps the real question for a business is not "How much down time is acceptable?" or "How fast should the system be?" but "How much are you prepared to spend?" In posing this question, you should prepare an options paper with various configurations that list the corresponding cost and benefits as they relate to the Service Level Agreements.
A common entry in options papers relates to backup and recovery technology, an example of which is highlighted in Table 2. In this example, two backup options are presented along with their corresponding costs. The first option is a sophisticated Storage Area Network snapshot backup technology that allows near instant backup and recovery. The next option is the standard SQL Server backup and recovery. In both cases, the advantages and costs are presented.
Table 2: What Will It Cost?
Option papers that highlight the cost of various options, such as this example for backup and recovery, help in achieving realistic expectations when developing service level agreements.
|Item/Cost||SAN Snapshot Backup||Native SQL Backup|
|Benefits||Near-instant recovery||Reduced complexity|
|Recovery Time||about 5 minutes||about 45 minutes|
Options papers containing examples such as this present clear choices and sharpens the debate on what's really important. After all, what's the point in spending lots of money building and maintaining a "Five Nines" system when an hour of downtime per week is perfectly acceptable?
Fortunately, SQL Server 2008 introduces a number of new and enhanced features that assist you in meeting your Service Level Agreements.
Ensuring Service Level Agreements Can Be Met
It's often the case that despite your best efforts, the Service Level Agreements cannot be met, and you won't discover this until disaster strikes. In order to feel comfortable with the agreements in place, it's crucial that you anticipate and plan for disaster.
DBAs often make the mistake of defining disaster too narrowly. Small events can have just as big of an impact as the larger, less likely ones. An appropriate Disaster Recovery plan is one that anticipates a variety of disasters and implements processes to test that the recovery plans are valid through simulation. The next section, Planning for Disaster, addresses these topics.