Automated Linux backups with cron and rsync: daily local + offsite routine
Set up automated backups on Linux servers using cron and rsync, with retention, logging and offsite copies over SSH. Step-by-step technical guide.
Automated backups on a Linux server aren’t a luxury — they’re the only thing between you and a bad night when a disk fails, the filesystem corrupts, or someone runs rm -rf in the wrong directory. Despite that, many sysadmins still push this decision to “later” and discover during the incident that the last snapshot is six months old.
This tutorial builds a daily local backup routine with rsync (using hardlink snapshots to save space) and offsite replication over SSH, scheduled with cron. Everything in an idempotent script, with structured logs and configurable retention. The focus is on a generic Linux server (Debian/Ubuntu/Rocky/Alma) running as a VPS — no exotic dependencies, no proprietary agent.
Estimated execution time: 30–40 minutes to set it up the first time, after which it runs on its own. Restoring a single file takes seconds; a full restore depends on data volume.
Prerequisites
A Linux server with root access (or full sudo), rsync installed, cron active, and a second server (or remote storage) accessible over SSH for the offsite copy. Disk space at the local destination of at least 2x the dataset size, to accommodate the snapshot history.
Ubuntu 24.04 / Debian 12 3.2.x >= 2x dataset SSH with key Verify versions before proceeding:
rsync --version | head -n 1
systemctl status cron # debian/ubuntu
systemctl status crond # rhel/rocky/alma
If rsync isn’t installed: sudo apt install rsync (Debian/Ubuntu) or sudo dnf install rsync (Rocky/Alma).
Directory layout and strategy
Before writing the script, define the layout. The recipe uses three tiers:
/backup/local/daily/YYYY-MM-DD/— daily snapshots (kept for 7 days)/backup/local/weekly/YYYY-WW/— weekly snapshots (kept for 4 weeks)/backup/local/monthly/YYYY-MM/— monthly snapshots (kept for 6 months)
Each daily snapshot appears to be a full copy, but rsync --link-dest reuses unchanged files as hardlinks from the previous day. The result: 30 days of backups consume only a bit more than one full copy plus the deltas.
Offsite replication only copies the latest snapshot (current) to a remote server, where you can recreate the same history structure if desired (or simply replace the latest copy).
Local backup script
This section builds the main script that cron will trigger every day. It performs an incremental snapshot with hardlinks, rotates retention and produces a log file.
Create the base directory and a dedicated user (optional but recommended on a shared server):
sudo mkdir -p /backup/local/{daily,weekly,monthly}
sudo mkdir -p /var/log/backup
sudo chown -R root:root /backup /var/log/backup
sudo chmod 700 /backupPermission 700 ensures only root can read the backup contents — basic protection against unauthorized reads if another account is compromised.
Create the script /usr/local/sbin/backup-daily.sh with the content below. It’s idempotent: running it twice on the same day doesn’t duplicate anything.
#!/bin/bash
set -euo pipefail
# Configuration
SOURCE_DIRS="/etc /home /var/www /var/lib/mysql-dumps"
BACKUP_ROOT="/backup/local"
LOG_FILE="/var/log/backup/backup-$(date +%Y-%m).log"
RETENTION_DAILY=7
RETENTION_WEEKLY=4
RETENTION_MONTHLY=6
TODAY=$(date +%Y-%m-%d)
DEST="$BACKUP_ROOT/daily/$TODAY"
LATEST_LINK="$BACKUP_ROOT/daily/current"
# Logging helper with timestamp
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" >> "$LOG_FILE"
}
log "===== Backup start $TODAY ====="
# Detect previous snapshot for --link-dest
LINK_DEST_ARG=""
if [ -d "$LATEST_LINK" ]; then
LINK_DEST_ARG="--link-dest=$(readlink -f "$LATEST_LINK")"
log "Using link-dest: $LINK_DEST_ARG"
fi
mkdir -p "$DEST"
# Sync with rsync
rsync -aHAX --delete \
--numeric-ids \
$LINK_DEST_ARG \
$SOURCE_DIRS \
"$DEST/" >> "$LOG_FILE" 2>&1
# Update 'current' symlink
ln -snf "$DEST" "$LATEST_LINK"
log "Local backup finished. Snapshot size:"
du -sh "$DEST" >> "$LOG_FILE" 2>&1
# Retention: drop old daily snapshots
find "$BACKUP_ROOT/daily" -maxdepth 1 -type d -name "20*" -mtime +$RETENTION_DAILY -exec rm -rf {} \;
log "Daily retention applied (keeping $RETENTION_DAILY days)"
log "===== Backup end $TODAY ====="Save and make it executable:
sudo chmod 700 /usr/local/sbin/backup-daily.shTest the script manually before scheduling:
sudo /usr/local/sbin/backup-daily.shCheck the generated log:
tail -n 50 /var/log/backup/backup-$(date +%Y-%m).logConfirm the snapshot exists and has content:
ls -lah /backup/local/daily/
du -sh /backup/local/daily/current/Run it again immediately. The second run should be much faster — rsync only transfers what changed and uses hardlinks for everything else.
Active databases (MySQL, Postgres) should not be copied directly from their data directory — you risk corrupting the snapshot. Generate a consistent dump before the rsync (e.g. mysqldump --single-transaction) into a directory that rsync then copies. The example above uses /var/lib/mysql-dumps precisely for this flow.
Offsite replication over SSH
Local backups protect against application failure or human error. They don’t protect against total loss of the server (fire, corrupted hypervisor, compromised access). For that, replicate to another server — ideally in a different region.
On the source server, generate a key pair dedicated to backups:
sudo ssh-keygen -t ed25519 -f /root/.ssh/backup_key -N "" -C "backup@$(hostname)"
sudo chmod 600 /root/.ssh/backup_keyNo passphrase, because cron can’t type one. The protection comes from the authorized_keys restriction on the destination, shown in the next step.
On the destination server, create a dedicated user (e.g. backup-receiver) and add the public key with a command restriction:
# On the destination:
sudo useradd -m -s /bin/bash backup-receiver
sudo mkdir -p /home/backup-receiver/.ssh /backup/remote
sudo chown -R backup-receiver: /home/backup-receiver /backup/remoteEdit /home/backup-receiver/.ssh/authorized_keys adding the public key generated on the source (/root/.ssh/backup_key.pub) prefixed with the restriction:
command="rsync --server -logDtprRe.iLsfxC --delete . /backup/remote/",no-pty,no-port-forwarding,no-agent-forwarding,no-X11-forwarding ssh-ed25519 AAAA... backup@source-serverThe command="..." directive forces the destination to accept only that specific rsync invocation — even if the key leaks, the attacker can’t get a shell. Adjust --server to match the options your local rsync uses (capture the exact line by running rsync -e "ssh -v" ... in verbose mode).
Create the script /usr/local/sbin/backup-offsite.sh on the source:
#!/bin/bash
set -euo pipefail
REMOTE_USER="backup-receiver"
REMOTE_HOST="backup.example.com"
LOCAL_CURRENT="/backup/local/daily/current/"
LOG_FILE="/var/log/backup/offsite-$(date +%Y-%m).log"
SSH_KEY="/root/.ssh/backup_key"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" >> "$LOG_FILE"
}
log "===== Offsite replication start ====="
rsync -aHAX --delete \
--numeric-ids \
-e "ssh -i $SSH_KEY -o StrictHostKeyChecking=accept-new" \
"$LOCAL_CURRENT" \
"$REMOTE_USER@$REMOTE_HOST:/backup/remote/" >> "$LOG_FILE" 2>&1
log "===== Offsite replication end ====="Permission and test:
sudo chmod 700 /usr/local/sbin/backup-offsite.sh
sudo /usr/local/sbin/backup-offsite.shThe first run copies everything and can take hours on large datasets. Consider doing the initial load outside peak hours or via physical media if the destination is on the same premises. Subsequent runs only transfer the delta — typically seconds to minutes.
Scheduling with cron
With the scripts validated, schedule them to run automatically every day.
Edit root’s crontab:
sudo crontab -eAdd the following (adjust the time to match your server’s lowest-traffic window):
# Ensures a consistent PATH — cron uses a minimal PATH by default
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
[email protected]
# Local backup every day at 03:00
0 3 * * * /usr/local/sbin/backup-daily.sh
# Offsite replication every day at 04:30 (after the local one)
30 4 * * * /usr/local/sbin/backup-offsite.shSave and exit. cron picks up the change automatically — no need to restart anything.
Verification
After 24–48 hours, confirm that everything is running:
# Daily snapshots accumulating
ls -lah /backup/local/daily/
# Log from the latest run
tail -n 30 /var/log/backup/backup-$(date +%Y-%m).log
# Offsite log
tail -n 30 /var/log/backup/offsite-$(date +%Y-%m).log
# On the destination server
ls -lah /backup/remote/
You should see dated directories (2026-05-28, 2026-05-29, etc.), with du -sh showing an apparent size equal to the dataset but a real consumed size much smaller thanks to the hardlinks. Confirm with:
du -sh /backup/local/daily/2026-05-29 # apparent size
du -sh --total /backup/local/daily/ # real total size
The difference between the two numbers is the savings from hardlinks.
Troubleshooting
Script works manually but fails in cron
Almost always PATH-related. cron runs with PATH=/usr/bin:/bin by default. Solution: either use absolute paths in the script (/usr/bin/rsync, /usr/bin/ssh) or set PATH= at the top of the crontab as shown above. Also check that MAILTO is configured to receive stderr — without it, failures turn into absolute silence.
rsync returns code 23 or 24
Code 23 means “some files were not transferred” (typically files in use or with restricted permissions); 24 means “some files vanished during the transfer” (common in /var/log or /tmp). These codes aren’t necessarily fatal errors — adjust the script’s set -e to tolerate them with rsync ... || [ $? -eq 23 ] || [ $? -eq 24 ] if they’re expected in your scenario.
Permission denied on the offsite destination
The command= directive in authorized_keys is literal — any change to the local rsync flags breaks it. Solution: run backup-offsite.sh manually with -e "ssh -v" on rsync, capture the exact rsync --server ... line it sends, and update command= on the destination with that string.
Disk space growing uncontrollably
Likely --link-dest isn’t picking up the previous snapshot — some change broke the current symlink, or the snapshots are on different filesystems (hardlinks can’t cross filesystems). Check with stat -c '%i' file on two dates: if the inode is the same, the hardlink is active; if it differs, you’re duplicating data.
Next steps
This routine covers production basics. To evolve:
- Add automatic weekly/monthly promotion: a copy (not a hardlink) of the daily snapshot into
weekly/every Monday and intomonthly/on the 1st of each month. - Integrate active alerting: if the latest log is more than 25 hours old, fire a notification (Discord, email, Healthchecks.io). A silently stopped backup is the worst scenario.
- Test monthly restores automatically: restore a random file from the oldest backup, compare its checksum against production, and log the result.
- Evaluate in-transit and at-rest encryption for the offsite destination (already in transit via SSH; for at-rest, consider LUKS on the destination volume).
- If you’re running this in production, a Hostini VPS (/vps) ships with fast NVMe volumes to host the snapshot history and generous bandwidth for offsite replication — backups stop being a limiting factor when choosing a plan.
Frequently asked questions
What is the difference between rsync with --link-dest and incremental tar?
rsync with --link-dest creates full snapshots using hardlinks for unchanged files — each backup directory looks like a full copy but only consumes the delta on disk. Incremental tar generates different archives on each run and requires restoring the base plus every incremental in order. For fast restores and simple management, --link-dest is usually preferable.
Why is cron not running my backup script?
Common causes: PATH not set in the crontab (cron uses a minimal PATH, so commands like rsync or ssh may not be found — use absolute paths), missing execute permission on the script, or no stderr redirection hiding the real error. Add MAILTO at the top of the crontab or redirect with >> /var/log/backup.log 2>&1 to capture output.
Does offsite backup over SSH require a password on every run?
No — and it shouldn't. Generate a dedicated key pair for the backup with ssh-keygen -t ed25519 -f /root/.ssh/backup_key with no passphrase, add the public key to the destination's authorized_keys restricting the allowed command (command="rsync --server ...",no-pty), and use ssh -i /root/.ssh/backup_key in rsync. Interactive passwords simply hang the job under cron.
How much retention is reasonable for daily backups?
A healthy default: 7 daily, 4 weekly, 6 monthly — adjust based on dataset size and storage cost. With --link-dest the marginal cost of keeping 30 daily snapshots is low (only the delta of changes), so don't be afraid to retain more than your initial intuition suggests.
How do you verify that a backup actually works without restoring everything?
Restoring is the only real test, so automate it: once a month, randomly restore a file from the oldest backup (reverse rsync or direct cp from the snapshot) to a temporary directory and compare its checksum to production via sha256sum. Log the result and alert on mismatch. An untested backup is hope, not a strategy.
rsync or borg/restic for Linux server backups?
rsync is simple, transparent and ideal for full copies with hardlinks (--link-dest) or syncing between machines. borg and restic offer block-level deduplication, built-in encryption and compression — real advantages for large datasets (TB) or offsite backups over bandwidth-limited internet. For small/medium servers, rsync covers 90% of cases with no extra dependencies.