Failed update to blame for outage of Microsoft cloud services
There are plenty of benefits of living in the cloud, but some major downsides too. Nearly five months ago an Amazon Cloud outage took down BetaNews' group chat service, alongside Heroku, Flipboard, Foursquare and Reddit among others. And, two days ago, Microsoft users went through a similar ordeal which mostly affected Hotmail, Outlook.com and SkyDrive -- three of Microsoft's more essential cloud services.
Microsoft's vice president, Arthur de Haan, has chimed in on the matter in a blog post which links the outage to the upgrade process from Hotmail to the new out-of-beta email service Outlook.com. Since 13:35 PM PDT on March 12 until 5:43 AM PDT on March 13, de Haan says that "a small part of the SkyDrive service, but primarily Hotmail.com and Outlook.com" suffered from a service interruption caused by a firmware update which failed "in an unexpected way".
The failed firmware update occurred in one of Microsoft's datacenters, in a "core part" of its physical plant, subsequently leading to a "substantial temperature spike in the datacenter". The heat was "significant enough" causing the "safeguards to come in to place for a large number of servers in this part of the datacenter". In that area of the datacenter Microsoft houses "parts of the Hotmail.com, Outlook.com, and SkyDrive infrastructure".
So what took the software giant so long to get it fixed? Even though we may assume that real people are behind cloud services and pulling strings like puppeteers to keep them working, as de Haan explains that is not the case. The man says: "there was a mix of infrastructure software and human intervention that was needed to bring the core infrastructure back online. Requiring this kind of human intervention is not the norm for our services and added significant time to the restoration".
For some users the outage may have caused irreparable damages, as Hotmail, Outlook.com and SkyDrive are essential services which house emails and cloud-stored files. In the cloud-connected era whatever we store "out there" may not also be backed up on a local drive -- not a safe and sound approach -- leaving some users without crucial emails or files in likely very important moments.
Admittedly, Microsoft has mostly fixed the issue overnight (local time) but for users outside of the United States the outage may have caught them with their guard down during work hours. What if someone had lost their job interview or even their job because of it? We want to fully embrace the cloud -- often portrayed as the future and only way -- but at what cost?