Who Gets Blame for Amazon Outage?
Amazon promises to fully explain recent cloud outage; IT managers will likely be asked to do the same by execs at companies using the services.
Tue, April 26, 2011
Despite the redundancies and backups built into the Amazon cloud, "you hit a combination of events for which the backups don't work," he said.
Users see the promise of cloud technology as a way to reduce costs and be greener, but "that [also] means concentrating processing in fewer, bigger places," said Brill. Thus, when something goes wrong, "it has a bigger impact."
Meanwhile, the promise of reliable cloud uptime is putting protection advocates -- the IT people who champion more internal reliability and safeguards -- at a disadvantage, he added. "There will always be an advocate for how it can be done cheaper," but "if you haven't had a failure for five years - who is the advocate for reliability?
"My prediction is that in the years ahead we will see more failures than we have been seeing because people have forgotten what we had to do to get to where we are," Brill added.
AppNeta runs its company on Amazon's cloud technology and was thus affected by the outage. However, its problems where short-lived because it's service is architected to respond to a data center failure in Amazon's cloud.
Matt Stevens, the chief technology officer of AppNeta, said its system was able to fallback to an alternative availability zone in another data center in Amazon's cloud.
"You still need to plan for worst-case scenarios," said Stevens, who said Amazon advises its customers to plan for a potential data center interruption. "It was actually their guidance that helped us avoid this from being more being more painful."
Amazon has built the system with multiple levels of disaster recovery, including a design for high availability across virtual infrastructure within a zone, such as the ability to failover between servers, as well as planning to failover to another data center, as AppNeta did.
AppNeta has redundant mirroring of its data in Amazon's S3 storage service, which allowed them to pull that data into a second data center. Their problem was limited to a couple of hours Thursday morning, said Stevens.
Stevens believes that the Amazon's outage will cause people to step back and ask some question about their internal architecture, as well as ask whether to adopt a multi-cloud strategy to do more to spread the risk. "That's certainly got to be top of mind for a lot of CIOs today," he said.
Patrick Thibodeau covers SaaS and enterprise applications, outsourcing, government IT policies, data centers and IT workforce issues for Computerworld. Follow Patrick on Twitter at @DCgov , or subscribe to Patrick's RSS feed . His e-mail address is pthibodeau@computerworld.com .
Read more about cloud computing in Computerworld's Cloud Computing Topic Center.


