http://www.wired.com/epicenter/2009/09/google-explains-why-you-didnt-have-gmail/Epicenter The Business of Tech
Google Explains Why You Didn’t Have Gmail
* By Ryan Singel Email Author
* September 2, 2009 | * 1:04 pm |
Want to know why you couldn’t use your Gmail account Tuesday?
Blame the outage on overworked routers that decided to go on strike. At least that’s what Google engineer (and VP) Ben Treynor wrote in a blog post Tuesday night, explaining in relatively clear engineer-ese why Gmail went down for about 100 minutes earlier that afternoon.
At about 12:30 p.m. Pacific a few of the request routers became overloaded and in effect told the rest of the system, “stop sending us traffic, we’re too slow!” This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded. As a result, people couldn’t access Gmail via the web interface because their requests couldn’t be routed to a Gmail server. IMAP/POP access and mail processing continued to work normally because these requests don’t use the same routers. That’s good to know. Not because I know anything about configuring request routers to keep them from acting like French factory workers, but because Google is letting me know clearly what happened, what they did and how they will take steps to keep it from happening in the future.
snip
Update: In response to a reader’s question, Wired.com asked Google if a hack or an attack was involved. According to a Google spokesman, “There was no attack or hack. They got overloaded because we had slightly underestimated the the load which some recent updates placed on them.” And if you are one of the Enterprise or Education customers who pays Google for their app service, you just got three free days of service added to your account, on account of Gmail failing to deliver its promised 99.9 percent uptime every month. At least that’s what a Google spokesperson told Wired.com.
(For those of you doing math at home, Google’s 100 minute outage means that even if Gmail is perfect for the rest of September, the best performance it can get is 99.8 percent. That falls below the Service Level Agreement Google promises to its top-level customers.) Gmail’s last major outage was in May, and it had a number of outages in 2008. But Google says its service is still far more reliable than corporate networks, many of which run Microsoft Exchange servers.