What "Uptime" Actually Means (And Why Status Pages Lie)

Every monitoring tool, every SLA, every "five nines" claim is built on a number that sounds precise but is mostly fiction. Uptime percentages are useful for comparing things, but as an absolute measure of reliability they're a bit of a con.

Let me explain what uptime actually measures, why your hosting's uptime claim probably overstates reality, and what numbers you should actually pay attention to.

What Uptime Percentage Really Measures

When a hosting provider says "99.9% uptime" they're claiming the server is reachable 99.9% of the time. Sounds great. In a 30 day month, 99.9% means you can have up to 43 minutes of downtime and still meet the SLA. That's not nothing. That's roughly equivalent to your site being completely unavailable for an entire morning meeting.

99.99% (the famous "four nines") allows about 4 minutes of downtime per month. 99.999% allows about 26 seconds. Most hosting providers claim 99.9% because going higher is genuinely difficult and expensive.

The bigger problem is what counts as "up". A server can respond with 200 OK while the actual website is broken. Most uptime SLAs only count "the server returned a response in under X seconds" and ignore whether that response was useful.

Why Most Status Pages Are Optimistic Fiction

Public status pages are a mix of useful communication and PR damage control. Here's how they often go wrong:

Manual updates lag the actual outage. Engineers have to acknowledge the issue, the comms team has to draft an update, and only then does it appear on the status page. By the time you see "we're investigating an issue" the outage has been going for fifteen minutes.

Severity gets downgraded. "Major outage" becomes "service degradation" becomes "investigating reports of slowness". The same incident reads very differently in real time vs in the postmortem.

Single-point monitoring. A lot of status pages are based on the provider's own internal monitoring from their own data centres. If your traffic comes through a CDN edge node having issues, your customers experience an outage that the status page never shows.

Backfilled incidents. Some providers retroactively edit incident history to look better. The status page you check next month might not reflect what actually happened today.

Excluded categories. Read the SLA fine print. Scheduled maintenance, third-party dependencies, network issues "outside our control", and DDoS attacks often don't count as downtime. You experienced an outage but the SLA wasn't breached.

What to Actually Measure

If you want to know how reliable a service really is, your own monitoring tells you more than the provider's status page. A few rules:

Measure from where your users are. If you're in Europe and your monitoring is from a US data centre, you're not getting useful data. Distributed monitoring, even if it's just two regions, catches issues that single-region monitoring misses.

Check the actual content, not just the response code. A 200 OK with a "database error" page on it is still an outage. Keyword monitoring or content scanning catches this. Pure HTTP monitoring does not.

Look at the slow responses, not just the failures. Response time degradation is usually the early warning before a full outage. A site slowly creeping from 200ms to 3000ms over an hour is heading for downtime, even if it never quite reaches the failure threshold.

Track time-to-recovery, not just total downtime. Two five-minute outages and one ten-minute outage have the same uptime percentage. They have wildly different impact. The two short ones probably cost you nothing. The long one might cost you significant revenue.

Honest Numbers for Honest Comparisons

When I look at a monitor in WebMon, I care about three things:

How many incidents in the last 30 days. Frequency matters more than total minutes for user trust.
The longest single incident. This tells you the worst case experience for users.
The pattern of slow responses. This is the leading indicator of bigger problems.

The headline uptime percentage is the least useful number on the page. It rolls up everything into a single figure that's easy to misinterpret. A site at 99.5% uptime with one weekly five-minute glitch is operationally very different from a site at 99.5% with a four-hour outage once a month, but the percentage doesn't tell you which is which.

What I Actually Tell People to Care About

Don't agonise over getting from 99.9% to 99.95%. The difference is mostly philosophical for most people's actual workloads.

Care about whether your monitoring is honest. Care about how fast you find out when something breaks. Care about how long your worst outage was and whether anything could have prevented it.

Uptime is a number. Reliability is a feeling your users have about your service. They're related but they're not the same thing, and your monitoring should help you see both.

WebMon shows uptime percentage, recent incidents, and slow response patterns side by side so you can see the full picture. Set up a monitor.

What "Uptime" Actually Means (And Why Status Pages Lie)

What Uptime Percentage Really Measures

Why Most Status Pages Are Optimistic Fiction

What to Actually Measure

Honest Numbers for Honest Comparisons

What I Actually Tell People to Care About

Related Posts

Five Signs Your Website Hosting Is Holding You Back

The Hidden Cost of Website Downtime for Small Businesses

Why Your Host's 99.9% Uptime Guarantee Is Basically Meaningless

Monitor Your Website Today

Essential Cookies

Analytics Cookies

Marketing Cookies