All entries for Thursday 08 December 2005
December 08, 2005
Writing about web page http://37signals.com/svn/archives2/dont_scale_99999_uptime_is_for_walmart.php
I've linked to an article on 37 Signals blog that talks about uptime for web applications. They state that you only need to worry about 99.999% uptime once you're doing big business.
Wright correctly states that those final last percent are incredibly expensive. To go from 98% to 99% can cost thousands of dollars. To go from 99% to 99.9% tens of thousands more. Now contrast that with the value. What kind of service are you providing? Does the world end if you’re down for 30 minutes?
If you’re Wal-Mart and your credit card processing pipeline stops for 30 minutes during prime time, yes, the world does end. Someone might very well be fired. The business loses millions of dollars. Wal-Mart gets in the news and loses millions more on the goodwill account.
Now what if Delicious, Feedster, or Technorati goes down for 30 minutes? How big is the inconvenience of not being able to get to your tagged bookmarks or do yet another ego-search with Feedster or Technorati for 30 minutes? Not that high. The world does not come to an end. Nobody gets fired.
Having a quick look at our wonderful IPCheck software, these are our values for the last 3 months.
- BlogBuilder: 99.70% (5h40m downtime)
- SiteBuilder: 99.93% (24m downtime)
- Forums: 98.97% (27h downtime)
- Single Sign On: 99.89% (1h43m downtime)
Whose fault that 0.30%, 0.07%, 1.03% and 0.11% are, it doesn't matter, sometimes things are just slow rather than down, sometimes things just break, sometimes it's the network, sometimes it's human error doing a redeploy. All our users see is that it is down for some small period of time. In many cases the system is not actually down, it is just that a single request from the monitoring server failed…but to be fair, if that happens, the chances are that occasionally it will happen to a use without the monitor noticing either.
This is just a small selection (but of the most commonly used systems we monitor), but you can see that we have good uptime. Would it matter if we were a couple of percentage points lower? As always…it depends.
If Single Sign On was down for an hour on a single Monday morning and that was the only downtime that month, it'd look like a fantastic month of 99.9% uptime. Unfortunately many systems rely on SSO and you would in some way at least degrade if not bring down completely all those other systems, adding up to a very nasty bit of downtime.
The 37 Signals article is correct that you do have to spend quite a bit of money to get that extra percentage point, but in the environment we work in where so many people come to rely on our services, it is important.
If however you need the occasional planned downtime and you can let everyone know, that is fine as people can make other plans, so pure uptime is not always important, it is keeping the unplanned downtimes to a minimum that counts.