Amazon investigating problem after S3 suffers 8-hour outage
Amazon's Simple Storage Service (S3) was down for more than eight hours over the weekend, affecting many prominent sites, and the company is still investigating the cause of the problem.
Cloud-based services such as those offered by Amazon provide cost effective solutions in computing and storage. However, the oft-cited drawback of relying on such offerings is that customers are left with little or no control if something goes wrong. The only option is to wait -- and in cases like this, wait nearly half a day.
Amazon's S3 Simple Storage Service which was introduced in 2006 is a part of the Amazon Web Services (AWS) suite, also consisting of the Elastic Compute Cloud (EC2) and SimpleDB services.
On July 20, the S3 component of AWS was down for more than 8 hours, affecting sites like SmugMug, Twitter, Centernetworks, and many of Amazon's own sites. The Amazon Web Service Health Dashboard shows that the Simple Storage Service and Simple Queue service experienced a "service disruption."
In a communication with the company, GigaOM's Om Malik received a rather general explanation as to why the service was down: "As a distributed system, the different components of S3 need to be aware of the state of each other. For example, this awareness makes it possible for the system to decide which redundant physical storage server to route a request to."
"We experienced a problem with those internal system communications, leaving the components unable to interact properly, and customers unable to successfully process requests. After exploring several alternatives, the team determined it had to take the service offline to restore proper communication and then bring service online again."
"These are sophisticated systems and it generally takes a while to get to root cause in such a situation -- we will be providing our customers with more information when we've fully investigated the incident," the company added.
Many companies utilize AWS, so a loss of functionality has the potential to affect a huge number of services. Both Red Hat and Sun utilize EC2, which has also experienced various outages. Consumer-aimed services like HP's Upline have faced numerous outages as well.