Postmortem: Why We Migrated to a Paid Cloudflare Plan

WebMon runs its monitoring checks through a Cloudflare Worker. It made sense at the start: edge compute, distributed by default, free for normal use. The Worker fetches your monitored URL from a Cloudflare data centre, checks the response, and reports back to the WebMon server.

For the first few months, the free Worker tier was plenty. We had a handful of users, a few hundred monitors total, and the daily request volume was nowhere near the free tier limit.

Then we grew, and things started getting interesting.

What Actually Broke

The Cloudflare free tier has a few limits that aren't obvious until you hit them. The relevant one for us is the subrequest limit: a single Worker invocation can make a maximum of 50 subrequests (HTTP fetches to other URLs).

Our setup batches monitor checks. The Worker fires every minute, looks up which monitors are due for a check, and fetches them in parallel. This is far more efficient than one Worker invocation per monitor.

It's also where the limit bites. As we grew, our minute-by-minute batches started exceeding 50 monitors. The Worker would happily check the first 50 and then fail silently on the rest. Users with monitors at the back of the batch would see gaps in their check history that we couldn't explain at first.

The Symptoms

The first sign was a few support messages along the lines of "my monitor showed up but then nothing for ten minutes". When we looked at the actual check data, sure enough, monitors at the end of the alphabetical sort were getting fewer checks than monitors at the start.

Other symptoms followed:

Slow response alerts firing late because the check that would have caught them got dropped
Apparent uptime improvements (you can't fail a check that doesn't run)
Sporadic gaps in response time history that looked like random noise

This isn't a great experience. Monitoring tools should monitor reliably or not at all.

The Short-Term Fix

Before doing anything dramatic we shrunk the batch size. Instead of trying to check all due monitors in one Worker invocation, we capped each batch at 25, then later 10. That kept us inside the subrequest limit but meant some monitors had to wait until the next minute to be checked.

This was a workaround, not a fix. It traded one problem (dropped checks) for another (delayed checks). For monitors on a 1-minute interval, having to wait 2 minutes for a check is noticeable. For monitors on a 10-minute interval, it didn't really matter.

Going Paid

The Cloudflare Workers Paid plan is $5/month plus usage. Sounds like nothing for a SaaS, but small SaaS budgets are real. Five dollars a month is a coffee, but it's also five dollars I'd been hoping to push back to free as long as possible.

The deciding factor wasn't the price. It was that the paid plan raises the subrequest limit from 50 to 1000 per invocation. With 1000 subrequests per invocation we can check most users' entire monitor list in a single Worker run, no matter how big the batch.

The actual cost we're seeing: about 9p per day. £2.70 a month, or roughly $3.40. Cheaper than the headline price suggests, because we don't actually use that much Worker compute.

What This Means For Users

Nothing visible changed. That's the goal. The migration was a Tuesday evening copy-paste of a couple of config values. No deployment, no downtime, no announcement.

But the monitoring is now boringly reliable, which is what monitoring should be. No more dropped checks, no more skewed uptime numbers, no more "why is there a gap in my graph at 14:32".

If you're a free WebMon user with a few monitors, you almost certainly weren't affected by any of this. The dropped checks were happening in the longer batches, and most free users were never near the back of the queue. But the change benefits everyone equally because there's no batch ordering anymore.

What I Took Away From This

Two things, really:

Free tier limits are real. "Free" usually means "free up to a usage level we won't tell you about until you hit it." That's fine, but plan for it. Have a rough idea of what your usage will look like at 10x your current scale, because you'll hit it sooner than you think.

Don't over-engineer around limits. Our batch-shrinking workaround was clever and entirely the wrong solution. The right fix was paying $5 to remove the limit. I spent more time deferring that than I saved in subscription costs. Time to make the call when you see it coming.

The Worker is now on the paid plan, the monitoring is reliable, and the daily cost is the price of half a coffee. Sometimes the right answer to "how can I scale around this" is "by paying for the next tier and getting on with it."

WebMon's monitoring checks come from Cloudflare's edge network. Set up your monitors on your dashboard.

Postmortem: Why We Migrated to a Paid Cloudflare Plan

What Actually Broke

The Symptoms

The Short-Term Fix

Going Paid

What This Means For Users

What I Took Away From This

Related Posts

Smarter Alert Emails and Scheduled Uptime Reports

Redirect Masking Detection: See What's Really Happening Behind the Redirect

We Taught an AI to Review Our Security Alerts (So You Don't Have To)

Monitor Your Website Today

Essential Cookies

Analytics Cookies

Marketing Cookies