Replacing find -exec with parallel: when -exec is fast enough and when xargs -P + -L starts winning

Every shell-scripting tutorial recommends find ... -exec cmd {} \; for “do this command on every matching file.” It’s the classic, obvious pattern. It’s also dramatically slower than the alternatives once your file count goes past a few hundred — and most people never measure, so they ship the slow version forever.

This is the actual benchmark on a real workload, the threshold where -exec stops being good enough, and the two replacements that win — -exec ... + for batch-friendly commands and xargs -P for parallelism.

Why `-exec ... \;` is slow

The semicolon-terminated form of -exec spawns one subprocess per matching file. find /home -name '*.tmp' -exec rm {} \; over 50,000 files is 50,000 forks + 50,000 execs of /bin/rm. Each fork is a few milliseconds; multiplied by 50k, you’re looking at 2-3 minutes of pure process-spawn overhead before rm does any actual work.

I benchmarked this on a tree of 50,000 small files on an Oracle ARM box:

# Generate the test set
mkdir /tmp/bench && cd /tmp/bench
for i in {1..50000}; do touch "f$i.tmp"; done

# 1. The slow version
time find . -name '*.tmp' -exec rm {} \;
# real    1m54.327s

# Reset
for i in {1..50000}; do touch "f$i.tmp"; done

# 2. The fast version
time find . -name '*.tmp' -exec rm {} +
# real    0m1.842s

# Reset again
for i in {1..50000}; do touch "f$i.tmp"; done

# 3. xargs with parallelism
time find . -name '*.tmp' -print0 | xargs -0 -P4 -L500 rm
# real    0m0.812s

62× faster going from \; to +. Another 2× from xargs -P4. The \; form was burning 99% of its time in fork overhead.

`-exec ... +` — the cheap upgrade

Replacing the trailing \; with + tells find to batch arguments into as few exec calls as possible (up to ARG_MAX, typically 128 KB on Linux). For our 50k-file rm, that’s about 4 exec calls instead of 50,000.

The catch: + only works when the command can take multiple arguments. rm a b c d works; chmod 644 a b c d works; cp a b doesn’t (cp needs exactly source + dest). For commands that don’t batch, you’re stuck with \;.

For the cases that do work, this is the cheapest single change in shell scripting. Probably 30% of legacy scripts on any given server have \; in spots where + would work — including yours.

`xargs -P` — when you want parallelism

The argument-batching of + still runs on a single CPU. If your command is CPU-bound (image conversion, encryption, compression), running it across all your cores is the next jump:

find /photos -name '*.png' -print0 \
  | xargs -0 -P "$(nproc)" -L1 cwebp -q 85 -o

Flag-by-flag:

find -print0 + xargs -0 — separates filenames with NUL instead of newline. Survives spaces, newlines, and quotes in filenames. Always use this pair; the bare version is broken on any filename with whitespace.
-P $(nproc) — run that many xargs children in parallel. nproc is the right default; for I/O-bound work, you can go higher (-P $(($(nproc) * 2))).
-L1 — pass one file per command invocation. Use this when each invocation needs to operate on exactly one file (cwebp, ffmpeg, etc).
-L500 (or higher) — batch hundreds of files per invocation. Use this for commands that take many args (rm, chmod, mv to a single dest dir).

The thresholds I use

< 100 files: -exec ... \;. The fork overhead is invisible. Don’t optimise.
100 — 10,000 files, batchable command: -exec ... +. One-character change, instant 50× win.
> 10,000 files, batchable command: xargs -0 -L500 cmd. Plus parallelism if cmd is CPU-bound.
Per-file CPU-heavy work (image conversion, transcoding, hashing): xargs -0 -P$(nproc) -L1 cmd. The parallelism dwarfs everything else.
Long-running per-file work where you want progress and retries: GNU parallel. Heavier dependency, but worth it for jobs that run hours.

What this looks like in real scripts

The “process every WebP in a directory” task I run weekly to compress my photo library:

find ~/photos -name '*.png' -newer /tmp/.last-run -print0 \
  | xargs -0 -P "$(nproc)" -L1 -I{} bash -c '
      out="${1%.png}.webp"
      [ -f "$out" ] && exit 0
      cwebp -q 85 -mt "$1" -o "$out"
  ' _ {}
touch /tmp/.last-run

The inline bash -c wrapper handles the per-file logic (skip if already converted, build the output path). xargs handles the parallelism. find handles the matching. Three tools, each doing one thing, the way Unix wants.

One footgun

xargs -P doesn’t preserve output ordering. If you’re piping the output to tee or sorting it later, the lines from different children will interleave. For commands whose stdout you care about, either accept the mess, route each child’s output to a separate file, or use GNU parallel with --keep-order.

For commands that just modify files in place — rm, chmod, cwebp — interleaved stdout is irrelevant and xargs -P is the right answer.

Don’t keep using -exec \; out of habit. The replacement is one or two characters, the gain is 50-200×, and your scripts will get out of your way faster.

Cover photo: uppsychic on Pexels.

Why -exec ... \; is slow

-exec ... + — the cheap upgrade

xargs -P — when you want parallelism