Today Skype is explaining what happened last week when their service went offline: The reason they give begins with a high number of user restarts:
“On Thursday, 16th August 2007, the Skype peer-to-peer network became unstable and suffered a critical disruption. The disruption was triggered by a massive restart of our users’ computers across the globe within a very short timeframe as they re-booted after receiving a routine set of patches through Windows Update.
The high number of restarts affected Skype’s network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact.”
The restarts then revealed a flaw in the Skype network’s peer-to-peer network healing algorithm so that resources could not be allocated properly. Skype is calling this a “software bug.”
So at the heart the outage was caused by a software bug.
The Windows Update restarts were the precipitating event, supposedly. I wonder though if we’re talking about restarts of end users machines or maybe restarts of supernodes? I thought that the Windows Updates weren’t all done at once–I doubt Microsoft would want the load either. I don’t doubt Skype’s word that there was a significant load in their app being launched and then users signing on, but it sure would be interesting to hear from Microsoft whether there was any unusual traffic from their perspective that pacth Tuesday. Or maybe was this just one of those events where everything had to go wrong in just the right way? Skype’s posting doesn’t make this clear, in fact, it doesn’t dig very deep trying to explain what their bug was.