Microsoft: Patch Tuesday Didn't Cause Thursday Skype Outage
Yesterday afternoon, Microsoft's security engineers formally ruled out the possibility of its regular monthly "Patch Tuesday" update sequence as having triggered a worldwide outage of Skype VoIP communication service on Thursday, which lasted for about two days.
In a company blog post yesterday morning, Skype engineers disclosed that they suspected a wave of client-side reboots were triggered by Patch Tuesday at roughly the same time. The temporary reduction in P2P traffic capacity that followed, Villu Arak said, triggered a failure of the Skype VoIP network. This was after Arak had blamed an internal server software glitch, during the time of the outage.
An official explanation written for Microsoft's Security Response Center blog by Christopher Budd was phrased as though it was ruling out what didn't cause the outage, in order to reveal what might have done so. "First, we checked to see if there were any issues introduced by the security updates that could have caused the situation," Budd wrote, "and we found that there were no issues introduced by the security updates themselves."
That sounded a bit shifty, at least taken unto itself. But then Budd continued, Microsoft checked to see if there was anything unusual about the size of the patches, the duration of the reboots, or the timing of the distribution that could have led to Skype's problems. "We confirmed that there is nothing unusual in this month's release that could have contributed to this situation," he added.
Public response to Skype's altered explanation, coupled with Budd's comments about Microsoft's response, led Skype's Villu Arak this morning to adopt a more gracious tone. "Some reactions to the explanation," he wrote, "have reminded us of one of the basic tenets of communication: It's not what you say. It's what they hear."
It was a mea culpa on Skype's part, acknowledging that the Windows updates were merely the catalyst for a problem that could have conceivably been triggered by something else, if that something else were a massive, widely installed operating system that needed security patches every month. Arak's description of the cause this time around more closely approximated his original reports of last week: "a previously unseen fault in the P2P network resource allocation algorithm Skype used."
Arak acknowledged Skype's use of supernodes - algorithmically chosen clients that are promoted by Skype's servers, and charged with extra duties to marshal VoIP traffic. Normally, being able to promote and distribute supernodes on the fly helps Skype respond quickly to variations in service loads. During previous Patch Tuesdays, this system has been able to respond to and tune itself for load disruptions caused by reboots. Not this time, he said, since there had never before been such a high usage load during the time supernodes were rebooting.
So it wasn't the fact that Skype clients everywhere were rebooting - just the supernodes. By virtue of their having been selected in the first place, they may be high-performance systems, raising their likelihood that they'd be set for automatic patching during the week.
In a presentation for a Recon security conference last year, engineers Fabrice Desclaux and Kostya Kortchinsky presented data they had extrapolated from a professional reverse engineering of Skype traffic (PDF available here). There, they said they discovered one of the factors Skype servers use for determining the viability of a node for promotion to a supernode, besides a good connection and high bandwidth, is the absence of a firewall.
This lends credence to John Bambenek's theory, which we presented yesterday: that supernodes affected by system reboots may not be highly customized systems. Rather, they could be set to system defaults, which would make their behaviors when compared to one another more uniform, and more likely to cause a problem all at once.
|A portion of an e-mail to Skype users, which appeared in their inboxes on August 13. Clicking on the attached link begins a software upgrade process to version 220.127.116.11.|
But one factor which Arak didn't mention is that on Monday, August 13, Skype users received e-mails notifying them of software upgrades to version 18.104.22.168. The e-mail promised improved sound quality and an updated front end. In BetaNews' tests, we saw the upgrade procedure trigger a separate process - not unlike what happens when upgrading Firefox - which polls Skype's servers separately to check for new add-ons. This is not a P2P operation at all, but a centralized function.
Certainly, this wasn't the first Skype upgrade to receive an e-mail blast to users, some of whom would undoubtedly have been supernodes, at least from time to time. Arak's explanation this morning did not mention the client software upgrade, though it did reference a "perfect storm" of events to which Patch Tuesday was but a contributor.