Docker eats disk. Not “uses a lot of disk” — eats. A few months of building images, running containers, mounting volumes, and pulling base layers, and /var/lib/docker is sitting at 80 GB on what was supposed to be a 100 GB disk. The instinct is to rm -rf something. The fear is breaking the daemon. Here’s the actual map of what’s safe to nuke and what isn’t.
The directory layout
Inside /var/lib/docker you’ll see something like:
/var/lib/docker/
├── buildkit/ # build cache (huge, safe-ish)
├── containers/ # running + stopped container state
├── image/ # image metadata
├── network/ # network state (small)
├── overlay2/ # the actual image + container layers (HUGE)
├── plugins/ # docker plugins
├── runtimes/ # OCI runtime configs
├── swarm/ # swarm mode state (if used)
├── tmp/ # transient (safe to clear when daemon is stopped)
├── trust/ # content trust state
└── volumes/ # named volumes (data lives here — DO NOT NUKE)
The two big consumers are nearly always overlay2/ and buildkit/. The dangerous one is volumes/. The rest is small and rarely matters.
The “use docker, don’t rm -rf” rule
Almost all the cleanup you actually want is achievable through Docker’s own commands, which keep the daemon’s metadata in sync. Direct deletion always carries a risk of leaving the daemon’s state-DB referencing things that no longer exist on disk. Sometimes that’s harmless; sometimes docker pull starts failing in weird ways.
Run these in order, from cheapest to most aggressive:
# 1. Stopped containers — usually safe to drop
docker container prune -f
# 2. Dangling images (untagged layers from rebuilds)
docker image prune -f
# 3. Unused images (any image with no container referencing it)
docker image prune -a -f
# 4. Unused volumes (volumes not attached to any container)
# WARNING: read the next section before running this
docker volume prune -f
# 5. Build cache (BuildKit's huge cache)
docker builder prune -af
# 6. Everything in one nuke (skips named volumes by default)
docker system prune -af --volumes
On a typical neglected box, docker builder prune -af alone reclaims 10-30 GB. docker image prune -a -f another 5-15. The two together usually fix the problem without touching anything risky.
The volumes pitfall
docker volume prune deletes any volume not currently attached to a container. The catch: “currently attached” means right now. If you stop a database container before pruning to “clean up,” the named volume holding your database files is unattached at the moment of the prune. Gone.
Always check what’s about to disappear:
docker volume ls -f dangling=true
Read every line. If any of them sound like data you care about (anything with “postgres”, “mysql”, “redis”, “data” in the name), start the container that owns it before pruning, or use docker volume prune --filter label=keep!=true with explicit labels.
What’s actually safe to rm -rf
If for some reason Docker’s prune commands aren’t enough — say, the daemon is so confused you can’t run them — these directories can be deleted manually with the daemon stopped:
/var/lib/docker/tmp/— transient build/pull workspace. Always safe to clear./var/lib/docker/buildkit/— build cache only. Daemon will rebuild from scratch but no data loss./var/lib/docker/overlay2/<hash>for hashes not referenced anywhere — extremely cumbersome to identify safely; almost always better to usedocker system pruneinstead.
Manual nuke procedure when prune isn’t working:
sudo systemctl stop docker docker.socket
sudo rm -rf /var/lib/docker/tmp/*
sudo rm -rf /var/lib/docker/buildkit/*
sudo systemctl start docker
docker system df
Don’t touch /var/lib/docker/overlay2/ wholesale unless you’re prepared to rm -rf /var/lib/docker entirely (which works — the daemon will rebuild a clean state, but every image, container, and volume is gone).
The nuclear option, done right
Sometimes the daemon’s state is so messed up that the cleanest fix is to start over:
# 1. Back up named volumes you care about
for vol in postgres-data redis-data; do
docker run --rm -v ${vol}:/src -v $(pwd):/dst alpine \
tar czf /dst/${vol}-$(date +%F).tar.gz -C /src .
done
# 2. Stop daemon, nuke everything
sudo systemctl stop docker docker.socket
sudo rm -rf /var/lib/docker
# 3. Restart, restore
sudo systemctl start docker
for vol in postgres-data redis-data; do
docker volume create ${vol}
docker run --rm -v ${vol}:/dst -v $(pwd):/src alpine \
tar xzf /src/${vol}-$(date +%F).tar.gz -C /dst
done
# 4. Re-pull / re-build images, restart compose stacks
docker compose -f /path/to/compose.yml up -d
This takes 15-30 minutes for a small server. The result is a perfectly clean /var/lib/docker and your data preserved. I do this once a year on the box where Docker accumulates the most cruft; everything else gets the prune treatment quarterly.
Prevention going forward
Add to /etc/docker/daemon.json to cap how big the cache and logs can grow:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"builder": {
"gc": {
"enabled": true,
"defaultKeepStorage": "20GB"
}
}
}
Restart docker after editing. Now build cache caps at 20 GB and per-container logs cap at 30 MB. The disk-eating problem disappears entirely as long as you’re inside that budget.
tl;dr: docker system prune -af covers 95% of cleanup needs. docker volume prune needs careful checking first. Direct rm -rf inside /var/lib/docker is rarely the right answer; if you reach for it, stop the daemon first and only touch tmp/ or buildkit/.
Cover photo: Felix Haumann on Pexels.
