Lessons Learned From a Recent Amazon Outage

Another Amazon cloud-services outage occurred on Sunday, August 7th in a Dublin, Ireland data center. This one occurred due to a lightning strike that hit a transformer near the Dublin data center. It led to an explosion and fire that knocked out all utility services thereby leading to a total data center outage. Amazon had its only European data center located there.

By Gregory Machler
Mon, August 15, 2011

CSO — Another Amazon cloud-services outage occurred on Sunday, August 7th in a Dublin, Ireland data center. This one occurred due to a lightning strike that hit a transformer near the Dublin data center. It led to an explosion and fire that knocked out all utility services thereby leading to a total data center outage. Amazon had its only European data center located there.

Mitigating the Risk of Cloud Services Failure: How to Avoid Getting Amazon-ed

My initial thoughts are related to disaster recovery and Amazon services. In their last significant outage in April, they had a network configuration change that led to an outage of services in the eastern United States. This outage begs other questions. Why isn't Amazon deploying a redundant power source, like a diesel powered backup? Maybe they did, but the fire blew out a portion of that utility service. So a more serious disaster emerged from an initial transformer explosion.

[Related: Creating a cloud SLA from diagnostic data]

How could this be addressed? How about fail-over to services in another geographic location in Europe. This didn't happen. I can only guess that building out another data center is cost prohibitive at this time and that is why Amazon doesn't have another European data center. The rest of the article mentions that it will take Amazon up to two more days to bring up the remaining servers.

It mentions that a significant period of time is being taken to start all of the servers up again. It also states that Microsoft, who has services in the same data center, does not have the same weakness. I wonder why this is; data replication should be a high priority, especially when Amazon lacks full-scale data center disaster recovery.

On Monday, August 8th, Amazon mentioned that a software error is slowing the recovery of the data within the European data center. This points to another error, a lack of business continuity testing. This testing is necessary, because conditions like this occur rarely. It also points to the fact that complex configurations make it hard to test various scenarios. Only deploying and testing a minimum number of application configurations is realistic. Otherwise there are too many permutations to test. See a previous disaster recovery article that mentions products should have standard configurations, similar to a car engine configuration and the car model.

So, it looks as if Amazon has more cloud services weaknesses that are bubbling up due to operational stresses. How can mid-sized and small businesses that outsource their web applications to Amazon's cloud protect themselves? It's clear that Amazon supports cloud applications where profitable. I suggest that those firms create a very detailed, per application SLA (Service Level Agreement) that lists global up-time, performance, and penalties when service isn't meeting objectives.

Continue Reading

Originally published on www.csoonline.com. Click here to read the original story.
What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?
Server virtualization has transformed corporate IT -- companies have enjoyed major cost savings and have gained flexibility and efficiency. But this has also led to a proliferation of virtual machines and servers that threaten to overwhelm data movement and storage technologies. In this IDG Tech Dossier, learn how utility storage makes for massive consolidation, flexibility and scalability, so IT departments can reduce storage infrastructure and lower costs while improving their ability to respond to fast-changing needs of business units.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
Learn how HP CloudSystem Matrix and HP 3PAR Utility Storage provide a solid, flexible foundation for your cloud environment.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
HP is driving the evolution of what we call the Instant-On Enterprise. It is an enterprise that embeds technology into everything it does to better serve citizens, partners, employees, and clients. We believe that today's Instant-On Enterprises need to think differently about how they source and deliver services that are enabled by technology. They need to take advantage of a hybrid delivery model-one that truly optimizes the mix between traditional IT, private cloud, and public cloud.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
This white paper describes the major requirements for network management solutions to help the organizations become more profitable, efficient and reliable.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
Enterprises are turning to the Cloud to improve business agility, reduce expenses and accelerate business innovation. Cloud computing redefines the way IT assets are deployed and consumed and dramatically affects the way data center networks are architected and managed. Conventional hierarchical data center networks built to support traditional IT architectures can't meet the security, agility and price/performance requirements of virtualized cloud computing environments. This white paper reviews the impact of cloud computing on data center networks and describes HP's approach to building simpler, more secure and automated networks that fully meet the stringent performance, security, reliability and agility demands of the new data center in the Cloud.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
When AlertBoot switched to the cloud it needed a load balancing solution that would support its migration and prevent as much downtime as possible. The company chose Riverbed® Stingray™ Traffic Manager to use while transitioning its infrastructure to an entirely virtualized environment. The move was a complete success, at one-third the cost of comparable hardware solutions.
Second in a three-part series discussing the "4 Must Haves" in virtualization security designed to help large organizations understand the challenges of securing virtualized environments while positioning themselves to take advantage of future IT and business opportunities.

Gain insights into next generation, virtualization-optimized solutions to help you drive:

+ Faster time-to-value from your security initiatives
+ Provide corporate with visibility and enable a state of continuous compliance
+ Reduce risk via automated configuration and policy-based access and enforcement engine
Learn how to get the most from your cloud investment in our on-demand webinar from BMC and InformationWeek. You'll hear how integrating the cloud into your production workload brings critical business benefits.
Supply chains require the ability to connect and share information with vendors and partners globally. EDI networks have made this connection possible by allowing various entities to upload information for others to see.
View this on demand webcast to learn if moving business communications to the cloud is right for your business. Featured industry experts DMG Consulting LLC president, Donna Fluss, Frost & Sullivan principal analyst, Michael DeSalles, and Interactive Intelligence senior vice president, Joe Staples discuss this topic and help you answer your pressing questions at the conclusion of this web event.
Capacity management may not be dead yet, but with the adoption of private clouds it's barely recognizable. Join Andrew Hillier as he outlines best practices for gaining control over dynamic capacity supply and workload demand in large scale virtual and cloud infrastructure. Hear how leading Fortune 500 organizations increased agility, reduced risk and costs by optimizing infrastructure planning and management processes.
In this webcast, Vantage Point Performance's Michelle Vazzana will reveal how to coach your reps to better performing pipelines.
Newsletter Sign-Up »

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all Newsletters | Privacy Policy
Sponsored Links

Master the cloud with the power of convergence from HP

Connect with IT leaders redefining mobility at the Enterprise Mobile Hub

Choose New and manage one device instead of 170

Choose New for 8x the firewall and NAT performance

Check out a smart way of mobilizing your business with enterprise-ready Samsung Mobile.

Redefine your data center with HP servers.

Enhance your business with Windstream IT Solutions. Speak to someone local.

BlackBerry® Mobile Fusion. Different mobile devices. One platform.

Click to see how Accenture has delivered high performance to clients

CYBERMARYLAND | Learn Why Maryland is the Epicenter for Cybersecurity

Get Ethernet speeds from 1 Mbps to 10 Gbps - Comcast Business Class

Cognizant. Leading in Business, Application & Technology Services

Collaboration: driving better business outcomes

Gain cutting-edge insights at MIT in 2-5 day executive programs.

Complimentary Gartner Report on BYOD: Media Tablets & Beyond. View Now

Elevate storage agility and efficiency with HP 3PAR storage.

Choose New and slash the number of devices you manage

Customized information views & Twitter events at New Fulcrum Point

Splunk translates machine data into "aha" moments for IT and the business.

ManageEngine Desktop Central - Automate and Audit Your Desktop Management! Learn More...

Cloud Readiness Starts with Intel® Technology

High performance. Delivered. Click to see Accenture's client successes

Visit the Virtually There Learning Page to learn how to use virtualization to your competitive advantage.

Free: Hunter Muller's "The Transformational CIO."

Join us for an upcoming Microsoft 365 live online demo event.

Discover your easiest path to unified communications

Virtualizing Your Infrastructure Just Got Easier

Connect with global CIOs now at Enterprise CIO Forum

Resource Center