Monit vs Healthchecks.io for cron heartbeats: the case for a ping-based service over a daemon you have to maintain yourself

For about three years I ran monit on every server I owned. It watched processes, restarted them, and emailed me when something fell over. Then I hit a problem monit doesn’t really solve: how do I know when a cron job didn’t run at all? A daemon that’s stopped is loud — monit screams about it. A cron job whose 4 AM fire silently no-op’d because $PATH wasn’t set is invisible. The job didn’t exit non-zero; it never executed. There’s nothing for monit to react to.

That’s the gap Healthchecks.io fills, and once I switched, I retired most of my monit config. This is the case for the swap.

How they think differently

monit is a sensor. It runs as a daemon, polls things you tell it about (processes, file mtimes, ports, HTTP endpoints), and alerts when an observable falls outside an expected range. The model is “watch this; tell me when it’s wrong.” It’s pull-based: monit reaches out to the thing it’s monitoring on a schedule.

Healthchecks.io is a dead-man switch. Each cron job sends an HTTP ping when it runs successfully. The Healthchecks server expects pings on a schedule you defined; if a ping doesn’t arrive within the grace period, you get an alert. The model is “I expected to hear from you by 4:05 AM; I didn’t; something’s wrong.” It’s push-based, and the alert is triggered by silence, not by an error.

This is the whole difference: monit can’t tell you about a cron job that didn’t fire. The cron job has to fire, do something monit can observe, and then stop doing it for the alert to trigger. Healthchecks doesn’t need the cron to fire; the absence of a ping is the signal.

The wiring

Healthchecks integration is one curl call at the end of your cron job:

0 4 * * * /usr/local/bin/nightly-backup.sh && \
  curl -fsS -m 10 --retry 5 -o /dev/null \
    https://hc-ping.com/your-uuid-here

If nightly-backup.sh succeeds, the curl runs and Healthchecks sees a green ping. If it fails (non-zero exit), the curl is skipped, no ping arrives, and Healthchecks alerts you when the grace period expires.

For richer signal, use the /start and /fail endpoints to send a heartbeat at the beginning of the run, success at the end, or an explicit fail with stderr captured:

UUID="your-uuid-here"
curl -fsS -m 10 -o /dev/null https://hc-ping.com/$UUID/start
LOG=$(/usr/local/bin/nightly-backup.sh 2>&1)
RC=$?
if [ $RC -eq 0 ]; then
    curl -fsS -m 10 -o /dev/null https://hc-ping.com/$UUID
else
    curl -fsS -m 10 --data-raw "$LOG" -o /dev/null https://hc-ping.com/$UUID/fail
fi

Now Healthchecks shows the run duration, captures stderr on failure, and pages you immediately rather than waiting for the grace period.

What you give up

  • Process restart. monit can start program "/etc/init.d/foo restart" when nginx falls over. Healthchecks alerts you; doesn’t fix anything.
  • Local-only operation. Healthchecks needs egress to hc-ping.com. If your server is air-gapped or behind a strict firewall, this doesn’t work.
  • Resource thresholds. “Alert when load average > 5 for 10 minutes” is a monit thing. Healthchecks is purely about whether a job ran on schedule.

What you get

  • Coverage of the failure mode that bites you most. A silent cron failure is the worst kind — by the time you notice, you have a week of missing backups. Healthchecks catches that within the grace window.
  • Zero local maintenance. No daemon to keep alive, no monitrc to keep in sync across servers. The server just needs to make outbound HTTPS calls, which it already does.
  • Run history with stderr. When a job has been failing for three days, you see exactly when it started and what stderr looked like at the time. Debugging is “open the run log” rather than “tail journalctl and hope”.
  • Self-hostable. If you don’t want to depend on hc-ping.com, the Healthchecks server is open source and runs in a small Docker container. Same protocol; you control the endpoint.

The hybrid I actually run

I didn’t fully retire monit. I kept it for two specific things: process restart on the small set of long-running daemons where I want auto-recovery, and disk-space alerting where Healthchecks doesn’t help. Everything else — every cron, every backup, every nightly maintenance task, every certbot renewal — moved to Healthchecks.

If you’re picking one tool for a single small box: pick Healthchecks. It eliminates the failure mode you’ll actually experience. monit is for the failure modes you’ve already had bad enough to write rules about — and the honest truth is most personal-infra setups never get there.

Cover photo: Rezende Luan on Pexels.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.