Friday, April 29, 2011
Amazon Apologizes for EC2 Cloud Outage
Analyst Charles King sees two key, albeit it not exactly new, lessons in the Amazon cloud outage. First, systems that rely on single points of failure will fail at some point. Second, companies whose services depend largely or entirely on third parties can do little but complain, apologize, pray and twiddle their thumbs when things go south.
Now that the dust has settled on the Amazon EC2 cloud Relevant Products/Services outage, the company is offering an apology and a credit for its role in making many popular web sites, like FourSquare, Twitter, and Netflix, unavailable.
Amazon Elastic Book Store customers in regions affected at the time of the disruption, regardless of whether their resources and application were impacted or not, are getting an automatic 10-day credit equal to 100 percent of their usage.
"We know how critical our services are to our customers' businesses and we will do everything we can to learn from this event and use it to drive improvement across our services," Amazon said in a statement. "As with any significant operational Relevant Products/Services issue, we will spend many hours over the coming days and weeks improving our understanding of the details of the various parts of this event and determining how to make changes to improve our services and processes."
Yes, Outages Happen
Most anyone familiar with data Relevant Products/Services center operations could sympathize with Amazon -- outages, even severe outages, happen, and companies are usually judged according to how quickly they respond to the problem and whether they are able to prevent similar events from occurring again, said Charles King, principal analyst at Pund IT Relevant Products/Services.
However, he added, while Amazon's cloud services are generally well regarded, patience among affected companies and their clients wore thin as the days stretched on. That was hardly surprising, he said, but it created the opportunity to consider exactly how organizations have been leveraging the EC2 cloud, and how well they've been doing it.
"Some, including Netflix and SmugMug, were hardly affected at all, largely because they had designed their environments for high availability -- in line with Amazon's guidelines -- using EC2 as merely one of several IT resources," King said. "On the other hand, those that had depended largely or entirely on Amazon for their online presence came away badly burned."
Cloud Lessons Learned
King sees two key, albeit it not exactly new, lessons in the Amazon cloud outage. First, systems that rely on single points of failure will fail at some point. Second, companies whose services depend largely or entirely on third parties can do little but complain, apologize, pray and twiddle their thumbs when things go south.
As King sees it, the fact that disaster is inevitable is why good communications Relevant Products/Services skills are so crucial for any company to develop, and why Amazon's anemic public response to the outage made a bad situation far worse than it needed to be. While the company has been among the industry's most vocal cloud services cheerleaders, King said, it seemed essentially tone deaf to the damage its inaction was doing to public perception of cloud computing.
"At the end of the day, we expect Amazon will use the lessons learned from the EC2 outage to significantly improve its service offerings," King said. "But if it fails to closely evaluate communications efforts around the event, the company's and its customers' suffering will be wasted."