The first time I filled / on a single-drive ZFS box was a Sunday at 2 AM. zfs list said the pool had 800 GB free; df said / was at 100%; the system was unhappy in that quiet, refuses-to-write-but-doesn’t-crash way that ZFS specialises in. The cause: I’d been creating hourly snapshots for six months and never deleted any of them, and the snapshot metadata plus copy-on-write blocks ate the headroom I thought I had.
This post is the snapshot retention recipe I now run on a 4 TB single-drive ZFS-on-root box, the zfs send | receive over SSH setup that mirrors it to a remote host, and the three traps that bit me getting there.
The retention shape that works
A 4 TB drive isn’t big enough for “keep everything forever” with copy-on-write. You need to rotate. The shape I run:
- 24 hourly snapshots
- 14 daily snapshots
- 8 weekly snapshots
- 6 monthly snapshots
That gives you about 50 snapshots active at any one time, which sounds like a lot but is bounded — and bounds matter when copy-on-write is in play. The thing that filled my disk wasn’t the count; it was that I had 4,300 snapshots and each one was holding a unique reference to blocks the live filesystem had since rewritten.
I use sanoid for the rotation, configured in /etc/sanoid/sanoid.conf:
[tank/root]
use_template = production
recursive = yes
[template_production]
hourly = 24
daily = 14
weekly = 8
monthly = 6
autosnap = yes
autoprune = yes
Cron entry: */15 * * * * /usr/sbin/sanoid --cron. Sanoid handles both creation and pruning; it’s the boring, reliable choice and I haven’t touched the config in a year.
The send-receive setup
Sending snapshots to a remote box is what makes ZFS worth running. The receiving side gets a byte-for-byte clone of every snapshot, can roll back to any of them, and can be promoted to live if the source dies. The naïve invocation:
zfs send -R tank/root@auto-2026-05-04_12.00.00 \
| ssh backup@remote-host zfs receive -F backup/source-mirror
That works for the first send. For incrementals you need -i against the previous snapshot:
zfs send -R -i @auto-2026-05-04_11.00.00 tank/root@auto-2026-05-04_12.00.00 \
| ssh backup@remote-host zfs receive backup/source-mirror
Doing this by hand is a recipe for missed sends. I use syncoid (sanoid’s sibling tool):
syncoid --recursive --no-sync-snap \
tank/root \
backup@remote-host:backup/source-mirror
Run it from cron every 15 minutes. It figures out the most recent shared snapshot, sends only what’s new, and uses mbuffer over SSH to keep the pipe full. --no-sync-snap is the flag that bit me — without it, syncoid creates a new snapshot every run and your retention plan goes out the window.
The three traps
- Receiving end runs out of free space first. Your remote backup pool needs more headroom than the source, not less, because incremental sends include all snapshots the source holds. If the source has 50 snapshots and the destination has 50 snapshots, the destination has the same retention burden — and any wasted space (different recordsize, different compression) compounds. Size the destination pool 1.5–2× the source’s used space.
- SSH timeouts kill long initial sends. The first
zfs send -Ron a 2 TB pool takes hours over a home connection. AddServerAliveInterval 30in~/.ssh/configon the sending side, or useautossh.mbufferthrough syncoid masks short hiccups; long ones still drop the session. - Forgetting to enable compression on the receive side. If the source is
compression=zstdand the destination iscompression=off, the destination eats 1.5–2× more disk than expected. Setcompression=zstdon the destination dataset before the first receive — afterwards is harder.
The “doesn’t fill /” guarantee
The actual constraint that prevents the 2-AM disaster is zfs set reservation. I reserve 50 GB on tank for the live filesystem:
zfs set refreservation=50G tank/root
refreservation guarantees the live dataset has 50 GB available for writes regardless of how many snapshots are pinning blocks. If snapshots try to grow into that 50 GB, ZFS will refuse to take new ones and emit a clear error rather than silently filling the disk and surprising you at 2 AM.
Combined with the bounded retention from sanoid, this is the safety net I wish I’d had on day one. The disk is still mostly full of useful data; the difference is that “full” now fails loudly during a snapshot, not silently during a critical write.
If you’re running ZFS on a single drive: bound your snapshot retention, syncoid the snapshots off-box, set a refreservation. The whole stack takes 20 minutes to configure and removes the entire class of “snapshots ate my disk” failures.
Cover photo: wwarby on Pexels.
