BlackBerry services return after historical global outage
Research in Motion founder Mike Lazaridis issued a somber apology on Thursday about the worldwide BlackBerry outage that has lasted the better part of a week, and followed it up with a press conference to provide a more detailed explanation of what went wrong in RIM's system.
As the company said yesterday, there was a failure in a single piece of hardware and its failover mechanism that caused a "ripple effect" throughout the entire BlackBerry infrastructure.
Lazaridis said a "dual-redundant, high capacity core switch designed to protect the infrastructure failed, and caused outages and delays for some customers in Europe, The Middle East, Africa, India, Brazil, Chile, and Argentina. This caused a cascade failure in our system. There was a backup switch, but it didn't function as intended and this led to a backlog of data in the system. The failure in Europe in turn overloaded systems elsewhere. When we restarted the system based in Europe, the data queue processing took much longer than we had expected to restore to our standard service levels. This backlog impaired service levels."
Research in Motion is auditing its infrastructure, and doing root cause analysis to ensure that this sort of catastrophic failure does not happen again. Because the root cause has not yet been identified, RIM's executives did not want to name who their hardware vendors are, or point fingers at whose hardware might have caused the massive outage.
The company says service levels have been restored to normal, but there have been no remarks made about compensation to consumers for the outage. The effect this outage could have on BlackBerry's reputation is difficult to quantify at present, but given the smartphone maker's recent loss of market share, it's safe to say it happened at a very bad time for the company.