Microsoft Sidekick Debacle and the Cloud: Lessons Learned

When Microsoft's storage service for Sidekick users broke down, cloud computing questions sprang up -- both fair and unfair. CIO.com's Bernard Golden discusses what can be learned from this outage and looks at the likely outcomes for Microsoft, users and the cloud ecosystem.

By Bernard Golden
Thu, October 15, 2009

CIO — This week's cloud tempest is the very visible breakdown of Microsoft's Danger storage service for the T-Mobile Sidekick phone. An apologetic email (as reported by TechCrunch) first went out from Microsoft to users noting that all data had been lost with no way to recover it. Apparently, it now seems that some or most of the data will be recovered, which is, of course, good news. I don't know that Microsoft has provided any formal explanation of what went wrong, but most of the speculation I've seen identifies a failed SAN upgrade with no data backup available as the cause for the data loss.

People on all sides of the cloud debate have been debating this incident and treating it as though it is a proxy for the entire concept of cloud computing.

While it's unlikely that one should conflate this situation with the totality of cloud computing, there are some very, very important issues highlighted by this situation that are worth exploring and understanding.

Lessons to be Drawn

It's a cloud: Some writing I've seen on this incident downplay it because, in the view of the authors, this service isn't really a cloud offering. They say it's a limited application, or an adjunct service to a hardware device, or it's really a consumer service and therefore not a "real" cloud application because those are aimed at business users. That's baloney.

First of all, it is a cloud application. It certainly fits into the common SaaS definitions. The "it's really a consumer service" rationale won't wash, either. With the blurring of consumer and commercial use, what's personal to one person might be mission-critical to another. And trying to deflect concern about this incident by defining it away misses the point. Cloud computing is a big tent (if I may mix a metaphor), and one of its strengths is the fact that many different approaches can be considered as cloud computing. In any case, clever dissembling is beside the point. If it walks like a duck, quacks like a duck, trying to convince someone that it's not a duck because it's actually a similar looking, slightly different species is unlikely to be successful.

This attention bespeaks intense interest in the cloud: Let's face it, all the hullabaloo about this incident is good news, because it means people recognize cloud computing is an important development. You don't spend a lot of time worrying about something you don't care about. It's obvious that the concept of cloud computing has garnered attention, to which I attribute the fact that everyone recognizes that the old methods of running IT infrastructure are expensive and don't scale.

This incident represents a breach of best practices: Losing data is the greatest shortcoming an operations group can suffer. A service outage is bad, but losing data is inexcusable. In fact, calling this a breach of best practices is overstating it. The term "best practice" describes a set of processes performed by the leaders in a field, not the mainstream. Backing up data is data management 101; really, it's 01. If this incident is truly a result of failing to do a backup, it contravenes the basic, simplest practice of managing data. No matter what the cause, losing data is inexcusable.

It calls into question one of the tenets of cloud computing: The expertise of cloud providers. My company does not run its own email service; we use Google to manage our mail system. Is this because we don't know how to run a mail server? Of course not. We do it for a very simple reason: using Google allows us to focus on our core mission, serving our clients.

We are very aware of what would happen if we ran our own mail server. Every time there was a problem, we'd treat it like an inconvenient interruption, and do just the minimum to patch the problem and get back to our real work. We would never devote the full amount of time running a mail server deserves. Therefore, our mail service would always be fragile, subject to interruption, and (most likely) vulnerable to security penetration. So we turn to a company that can devote real resources to running our mail server, one that follows best practices, and one that can take the necessary time to do it right.

An article on CRN blamed the outage on the fact that Microsoft is working on another project and pulled engineers from Danger onto the other project. Frankly, this is, or should be, irrelevant from a user perspective. A cloud provider is running a service and has to be committed to operational excellence, despite any other distractions or competing priorities. Otherwise, it forces the customer to examine the internals of the cloud service. This, from the perspective of the customer, is impractical, since everyone has limited time to devote to these things—a problem which will only get worse, given the fact that we are moving to a world in which use of cloud services is rapidly multiplying.

Moreover, most cloud providers don't want a horde of customers insisting on auditing the service—the support required for customer audits is not scalable. Finally, a customer shouldn't have to examine the inner workings of the cloud service. One doesn't question how the local electric utility schedules its generator maintenance, why should it be necessary for a cloud service? Customers should not have to do detailed evaluations of a cloud service: it's the job of the service provider to ensure appropriate operational processes in place.

Whatever the reason for the data loss, it calls into question the tenet that cloud computing enables a better level of discipline and expertise to be devoted to a service offering. If a customer can't depend on a cloud provider to perform at a higher level than the customer could do on its own, why should it turn to the cloud?

Likely Outcomes of this Incident

Microsoft evaluates its practices throughout its cloud offerings: I guarantee that one outcome of this incident is that an edict came down from on high: "Make sure no other system is vulnerable to this problem!" There are undoubtedly a bunch of operations groups at Microsoft digging through backup practices to ensure redundant data is stored and that reliable backups are being performed. Also undoubted is the response of these groups: "how come we're being stuck with a ton of extra work because they screwed up?" Fellas, that's just the way organizations work.

Other cloud providers use this as a "teaching moment": While these cloud companies are wiping their hands across their foreheads in relief, thinking "there but for God's grace go I," senior management is regarding this incident as an inexpensive way to learn an important lesson, and are taking it as an opportunity to do a low-risk drill. Of course, if other Microsoft operations groups resent having to do work because of this incident, imagine how ops groups in other companies feel!

Microsoft's credibility suffers a short-term hit: Some people will generalize this situation to all of Microsoft's offerings, and be more cautious about using them. Let me be clear: I don't believe this situation represents Microsoft's typical operations practices. Hotmail is a far larger service, and I don't recall hearing anything like this happening with it. Nevertheless, Microsoft's overall cloud reputation will be tarnished for a while.

The best thing for Microsoft would be to treat this as crisis management event, and follow the established playbook: early apologies, full transparency, frequent updates. That still won't prevent people from re-evaluating their opinions, at least in the short-term, but it will help return those initial re-evaluations back to their long-term assessments more quickly.

Cloud computing in general suffers a short-term hit: Any time one market participant suffers a significant blow, the concern spreads to others. All cloud providers are going to be questioned about their competence regarding storage practices. It's inevitable and unavoidable. Rather than resisting it, they should take it as an opportunity to proclaim about how much they are concerned on this topic and describe at length the extensive, redundant, and highly structured processes they have in place to avoid issues like this one. This information won't stop people from querying the provider, but it shows responsiveness and provides the opportunity to pick up share.

Long-term, this is a minor bump in the road: Of course this is a significant incident, and of course a very difficult situation for those affected by it, but in the long-run, this will be looked back at as a minor incident. Cloud computing is gaining momentum, driven by an appreciation of its strengths and cost efficiencies, and a problem, even one as serious as this, will not long hinder its progress.

Bernard Golden is CEO of consulting firm HyperStratus, which specializes in virtualization, cloud computing and related issues. He is also the author of "Virtualization for Dummies," the best-selling book on virtualization to date.

Follow Bernard Golden on Twitter @bernardgolden. Follow everything from CIO.com on Twitter @CIOonline

In this paper, Forrester Consulting examines the total economic impact and potential return on investment (ROI) realized by three Enterprise organizations as they virtualized mission-critical Oracle databases on the VMware vSphere platform. The purpose of this study is to provide readers with a framework to evaluate the potential financial impact of VMware vSphere on their organizations.
Even though virtualization has brought positive change to enterprise IT over the last decade, some skepticism remains about how valuable virtualization can be in the way companies deliver and run business applications. Uncover the truth about how you can run your business critical applications with confi dence without sacrifi cing
availability or service quality-and at lower costs.
This IDG whitepaper highlights key findings based on the Quickpoll Survey conducted with more than 300 Enterprise and Commercial IT decision makers worldwide about the state of their virtualization of business critical applications. This paper answers such questions as: What drivers are pushing companies to extend virtualization beyond servers? and What value are they realizing? Central to the paper are key results that expose risks of the past (fears of limited ISV support, performance impact) no longer are a factor for companies moving to 80+% virtualized.
The Kelley School of Business at Indiana University deployed VMware Infrastructure which decreases costs, streamlines server deployment, and reduces energy consumption.
New study quantifies how VMware improved TCO and ROI for three companies' IT landscapes.
This IDC white paper explains how much of the Enterprise IT community is at a crossroads in extending their journey to the private cloud: Companies must virtualize their business critical applications in order to reap the benefits of cloud computing. The paper also includes two case studies and a sidebar highlighting the experiences of three enterprises with virtualizing their business-critical applications, which include Oracle and Microsoft SQL databases, SAP and enterprise Java, and a Microsoft Exchange email system.
As greater numbers of datacenter servers transition from the physical to the virtual world, the components of virtualization success come to the fore. What scores of organizations have discovered is that success is derived from an optimal pairing of the right software platform with the right hardware platform.
Virtualizing business-critical applications is an essential step in your journey to the cloud. Microsoft SQL Server, Exchange and SharePoint, and Oracle applications, are often the backbone of business IT. The benefits of virtualizing these applications extend far beyond mere consolidation. Understanding how VMware improves quality of service and agility while reducing costs will help you make the case for taking virtualization to the next level in your company.
Virtualizing business-critical applications has become a key focus for organizations as they move along their virtualization journey. With the launch of VMware vSphere® 5, VMware is helping customers accelerate the deployment of business-critical applications, including Exchange, SQL, SAP and Oracle.
Want to say goodbye to missed SLAs? VMware can help you virtualize mission-critical applications such as Oracle, MS Exchange and SharePoint to achieve dramatic improvements in uptime, performance and responsiveness. In this webcast, we'll discuss the key benefits of virtualizing your agency's most critical applications and Oracle databases as a necessary first step in fulfilling OMB's mandate to move IT services to the cloud. With VMware, you'll be on the way to quick, effective and full compliance.
Federal IT managers are on the forefront of realizing the benefits that a secure, easy-to-manage virtual desktop environment can provide. The key is how to deliver the end-user experience that is comparable to a physical desktop. This webcast will show how the recently released VMware View 5 environment is being used to deploy virtual desktops to provide mission-critical solutions around Disaster Recover/COOP, telework and secure mobile applications to federal organizations. View this webcast and learn how new features and benefits of the VMware View 5 environment meet the needs of Federal customers
This video webcast is designed to help those with little to no virtualization experience understand why virtualization and VMware are so important to driving down both capital and operational costs. The session will start with the introduction of the key concepts and technologies of virtualization, introduce the vSphere Hypervisor, and build up to an overview of VMware vSphere® 5, the world's most robust and complete virtualization platform. This session will also discuss new solutions such as the vSphere Storage Appliance and VMware GO that are making it easier than ever before to get started with virtualization.
Newsletter Sign-Up »

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all Newsletters | Privacy Policy
Resource Center