You hear that the cloud is a resilient, redundant source of computing power that will always be there when you need it. Then you read about the outages at infrastructure-as-a-service providers.
Amazon Web Services suffered an outage in June – tough for some, though not as bad as Amazon’s “remirroring storm” in April 2011 that led to thousands of customers being inconvenienced or financially damaged. The security certificate-issuing server in Microsoft’s Azure cloud failed to make the leap and didn’t recognize the date Feb. 29 this year, causing an outage. In 2007, Rackspace lost a transformer at one of its datacenters, and servers started going down.
Why weren’t these outages preventable? Do they each reflect a single point of failure that should have been both foreseen and prevented, or is that just Monday morning quarterbacking?
In fact, cloud datacenters rely on a relatively new type of architecture, running with fewer personnel and more moving parts than conventional datacenters. The designers of these architectures go to great pains to build in redundancies and prevent failures, but they don’t always foresee some events that sound like long-shot impossibilities but nevertheless do happen.
Next Page: Rackspace Outage