You SSH into your server and run the same five commands every Sunday morning: df -h, free -h, systemctl --failed, apt list --upgradable, lastb. Five minutes of looking around to confirm nothing’s quietly broken. Multiply by N servers and it stops being a casual habit. The fix is a 60-line shell script that runs weekly, gathers the same five questions worth of output, formats it into a single email, and drops it in your inbox at 8 AM Monday.
Below is the actual script I run. It’s deliberately boring — no Prometheus, no Grafana, no Telegraf agent. Just the basic commands you’d run by hand, formatted into one email per server per week.
The script
#!/usr/bin/env bash
# /usr/local/sbin/weekly-health-report
# Cron: 30 7 * * 1 /usr/local/sbin/weekly-health-report
set -uo pipefail
REPORT=$(mktemp)
HOST=$(hostname -f)
DATE=$(date -u +"%Y-%m-%d %H:%M UTC")
{
echo "==== Weekly health: $HOST ===="
echo "$DATE"
echo
echo "--- Disk usage ---"
df -h --output=source,size,used,avail,pcent,target | grep -vE '^(tmpfs|devtmpfs|/run|udev)'
echo
echo "--- Memory ---"
free -h
echo
echo "--- CPU load ---"
uptime
echo
echo "--- Failed systemd units ---"
FAILED=$(systemctl --failed --no-legend --no-pager)
if [ -z "$FAILED" ]; then
echo "(none)"
else
echo "$FAILED"
fi
echo
echo "--- Pending package updates ---"
if command -v apt >/dev/null; then
apt list --upgradable 2>/dev/null | tail -n +2 | head -30
TOTAL=$(apt list --upgradable 2>/dev/null | tail -n +2 | wc -l)
echo "($TOTAL total)"
elif command -v dnf >/dev/null; then
dnf check-update --quiet 2>/dev/null
fi
echo
echo "--- Reboot required ---"
if [ -f /var/run/reboot-required ]; then
echo "YES — see /var/run/reboot-required.pkgs"
cat /var/run/reboot-required.pkgs 2>/dev/null | head
else
echo "no"
fi
echo
echo "--- Last 5 successful logins ---"
last -F | head -5
echo
echo "--- Last 5 failed login attempts ---"
lastb -F 2>/dev/null | head -5 || echo "(/var/log/btmp not readable)"
echo
echo "--- Top 5 by RAM ---"
ps -eo rss,pid,user,comm --sort=-rss | head -6 \
| awk 'NR==1 {printf "%-12s %-6s %-12s %s\n", $1, $2, $3, $4; next}
{printf "%-12s %-6s %-12s %s\n", $1/1024" MB", $2, $3, $4}'
echo
echo "--- Top 5 by CPU ---"
ps -eo pcpu,pid,user,comm --sort=-pcpu | head -6
echo
echo "--- Disks SMART status (if smartctl present) ---"
if command -v smartctl >/dev/null; then
for d in /dev/sd? /dev/nvme?n?; do
[ -e "$d" ] || continue
s=$(smartctl -H "$d" 2>/dev/null | awk '/SMART overall-health|SMART Health/ {print $NF; exit}')
printf "%-15s %s\n" "$d" "${s:-unknown}"
done
else
echo "smartctl not installed"
fi
echo
echo "==== end ===="
} > "$REPORT"
# Send via msmtp / mailx / a webhook
if command -v msmtp >/dev/null; then
{
echo "From: server@$HOST"
echo "To: you@example.com"
echo "Subject: [$HOST] Weekly health $DATE"
echo "Content-Type: text/plain; charset=utf-8"
echo
cat "$REPORT"
} | msmtp -t
else
# Fallback: post to a webhook
curl -sS -X POST 'https://your-webhook' \
-H 'Content-Type: text/plain' \
--data-binary @"$REPORT"
fi
rm -f "$REPORT"Why these specific checks
- Disk usage. The single most-common cause of “the server stopped working overnight” is a full disk. Better to see “94% used” on Monday morning than to find out at 3 AM Wednesday.
- Memory. Useful for spotting a leak that’s slowly consuming swap.
- Failed systemd units. A unit that crash-looped during the night is silent if you don’t ask.
- Pending updates. Lets you see “37 packages to upgrade, including kernel” and plan a maintenance window.
- Reboot required. Ubuntu’s
/var/run/reboot-requiredflag gets set after kernel/libc updates. Easy to forget. - Logins. Both successful and failed. Failed logins climbing into the thousands is a fail2ban/CrowdSec config check.
- Top processes. Calibrates your sense of what’s normal — if mysqld suddenly used 2x more RAM than last week, you’d notice.
- SMART. Drives often signal failure for weeks before they actually die. The weekly check catches it.
The mail-delivery layer
Don’t try to send via system cron MAILTO= — that path is broken on most modern servers (covered in a previous post). Two reliable paths:
- msmtp + a real SMTP relay. Install
msmtp-mta, configure~/.msmtprcwith credentials for SES / Mailgun / Brevo / Fastmail. Outbound on 587 with TLS. Authenticated, deliverable. - Webhook to a chat tool. Slack, Telegram (via bot), Discord (via webhook), Pushover, ntfy.sh. Curl POST the report as a code block. The advantage is unified visibility — all your servers’ weekly reports show up in one channel.
# ~/.msmtprc — minimal config for AWS SES
defaults
auth on
tls on
tls_starttls on
account ses
host email-smtp.us-east-1.amazonaws.com
port 587
from server@example.com
user AKIA... # SES SMTP username
password BXX... # SES SMTP password (NOT your AWS key)
account default : sesCron it for Monday morning
# /etc/crontab
30 7 * * 1 root /usr/local/sbin/weekly-health-report07:30 UTC Monday is 8:30 AM in the UK, 9:30 AM Berlin, 1:00 PM IST — pick the offset that lands the report on your phone right after breakfast. The report itself takes ~3 seconds to generate; the email arrives within the minute.
Fleet-scale variant
If you have 5+ servers, getting 5 separate emails is noisy. Modify the script to POST the report to a central endpoint (a webhook, an S3 bucket, a Discord channel, an SQS queue), and read all the reports together on Monday morning. The relevant chunk:
# Replace the email block with:
curl -sS -X POST "https://discord.com/api/webhooks/.../..." \
-H 'Content-Type: application/json' \
-d "$(jq -n --arg c "\`\`\`$(cat $REPORT)\`\`\`" '{content:$c}')"End-state: every Monday at 7:30 UTC, every server in your fleet self-reports in one Discord channel. You scroll through them with coffee. Anything alarming jumps out (red disk, failed unit, SMART fail). Five minutes of attention covers the entire week of “is everything OK.” Compare to the alternative of “log in to each box, run the same commands, hope nothing was off” — this is one cron entry plus 60 lines of shell.
Photo: Analytics dashboard with charts by Negative Space on Pexels.
