Hourly and daily rotating directory snapshots

Mon 11 October 2010

I have a bad habit of accidentally deleting things. I also am extremely paranoid about my laptop getting lost/broken/stolen and losing what is pretty much my entire work dataset. To combat that, I run a daily backup, originally using duplicity. Duplicity is fantastic, but when it needs to do a full backup, over a slow link 12G of data can get messy. Also, I'd like to be able to look "back in time" a couple hours sometimes and see a file as it was before I mangled it. Since I don't want to put my whole home directory under version control (which some people do!) and I want it to automatically do the snapshotting, I came up with this rsync script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#!/bin/bash

# Snapshot /home each hour for 24 hours (cycle) to limit damage
# if I delete something
# You could then do a tar/backup of last hours snapshot to run 
# a daily offsite backup if you wanted.

excludedFile="${HOME}/.run-snapshot-rsync-excludes"

dailyServer="my.backup.host.com"
dailyShots="backups/$(hostname -s)/daily"
hourlyShots="/home/.snapshots/${USER}"

if [ "$2" == "--throttle" ]; then
    rsyncThrottle="--bwlimit=30"
else
    rsyncThrottle=""
fi

case "$1" in
    hourly)
    shots="$hourlyShots"
    prevShot="${shots}/latest"
    curShot="${shots}/$(date +%H)"

    nice -n 19 ionice -c 3 rsync -avrP --delete-excluded   
       --exclude-from=$excludedFile   
       --delete   
       --link-dest=${prevShot}   
       /home/rmonk/   
       $curShot

    touch $curShot
    rm ${prevShot}
    ln -s $(basename ${curShot}) ${prevShot}

    ;;
    daily)
    server="$dailyServer"
    shots="$dailyShots"
    prevShot="${shots}/latest"
    curShot="${shots}/$(date +%u-%A)"

    nice -n 19 ionice -c 3 rsync -avzrP --delete-excluded   
   --exclude-from=$excludedFile   
   --delete $rsyncThrottle  
   --link-dest=$prevShot   
   /home/rmonk/   
   ${server}:$curShot
    ssh ${server} "rm ${prevShot}; ln -s $(basename ${curShot}) ${prevShot}"
    ;;
esac

Hourly backups

  • Create a /home/.snapshots/\$USER directory, owned by the user
  • Use

    run-snapshot.sh hourly
    

    in a cron job

Daily backups

  • Create a \~/backups/\$HOSTNAME/daily directory on the remote host
  • Set your ssh keys up so you can log in
  • Run

    run-snapshot.sh daily
    

    or

    run-snapshot.sh daily --throttle
    
  • The throttle option reduces the upstream bandwidth usage to help on remote network connections, but it will take longer

What do I end up with?

Directory listings that look like this:

[~/backups/adjutant/daily]$ls -l
total 72
drwx--x---. 143 rmonk rmonk 12288 Oct 11 15:24 1-Monday
drwx--x---. 141 rmonk rmonk 12288 Oct  5 07:00 2-Tuesday
drwx--x---. 141 rmonk rmonk 12288 Oct  6 07:05 3-Wednesday
drwx--x---. 141 rmonk rmonk 12288 Oct  7 07:09 4-Thursday
drwx--x---. 143 rmonk rmonk 12288 Oct  8 07:15 5-Friday
drwx--x---. 143 rmonk rmonk 12288 Oct  9 09:57 6-Saturday
lrwxrwxrwx.   1 rmonk rmonk     8 Oct 11 15:27 latest -> 1-Monday

What about disk usage?

[~/backups/adjutant/daily]$du -sh *
11G 1-Monday
522M    2-Tuesday
304M    3-Wednesday
316M    4-Thursday
352M    5-Friday
226M    6-Saturday

Because we are using rsync in link-dest mode, it takes up a minimal amount of disk space for each snapshot. Each of these directories could be rsynced or copied back to the system for a full restore, sans the material in the exclude file.

Category: Linux Tagged: backup bash linux scripting shell

comments