Amazon's two-day cloud computing nightmare nears an end
Amazon appeared to finally have the issues with its Web Services cloud platform under control, saying late Friday afternoon that all but its most "time consuming" volumes had been recovered and were back online. This seems to match up with reports that those websites that depended on Amazon's cloud were for the most part operating normally.
The partial failure which affected Amazon's cloud servers in its Northern Virginia facility, occurred early Thursday morning. Several popular websites including Foursquare, Reddit, and Quora were down for much of Thursday, and those issues extended into Friday as well.
As of press time late Friday afternoon, all three websites appeared to be functioning normally or close to it. There could still be some hiccups over the next several hours as volumes continue to be restored, some of which may hold data necessary to use all functionality of those sites.
Some, such as Quora, were using databases from the day before the outage in the meantime to bring the sites back to normal. That said, much of Wednesday's activity would be missing -- and it is not certain if any new data entered into the database between then and now could be merged into the regular database when AWS is fully restored.
Either way what is still missing is a root cause to Amazon's difficulties. The company has been silent other than what it has posted on its status page, and even there the reasons for the problem are rather vague.
Some analysts say they believe the problem may have been complex enough that even the company itself isn't completely sure what had happened.
Amazon's issues highlight the drawbacks of taking it "to the cloud" vis a vis having one's own IT infrastructure. While a considerable amount of overhead is cut out by outsourcing, companies and services who depend on it for its operations are at the mercy of these cloud computing providers when problems arise.
"Use this Amazon failure as a warning that cloud services are not bulletproof and that they are likely to fail at any time for the same reasons any complex system can fail," Enderle Group founder Rob Enderle wrote for Datamation Friday. "They will then take many, most, or all of their customers off-line with them."