Mar 5, 2011

Secure Snapshotted Backups with python, encfs, and rsync

I recently made a very dumb mistake and wiped out /home/greg/ on my personal desktop.  It wasn't a hardware failure, it was user error.  I had some manual backups, it wasn't catastrophic.  I had never really set up a good system though, and there were some annoying losses.  After restoring my sanity I decided it was time to set up better backups.  I run a NAS with ~3T of usable disk space and part of the reason for running this was backups.  My failure to set up a backup system was the only thing missing.

Obviously lots of people have solved this problem before, but my situation was unique enough that many options were off the table.  I had a few requirements:
  • My home directory (primaries) is encrypted with Ubuntu's TrueCrypt setup, the documents I want to back up the most are financial in nature, and so I wanted my backups always encrypted on disk.
  • I want snapshotted backups so that I was resilient against hardware failures, but also from rm -rf stupidity.
  • However, I did not want to simply store a series of diffs as that would make recovery more complex and I want recovery to be simple.
  • Still, I wanted efficiency and speed so I wasn't choking my internal network at various points in the day.
Most importantly though, I wanted to understand exactly how my backup system works and what it's doing.  Rather than trust some other code that I didn't understand and couldn't tweak, I wanted to roll my own.  Most folks do this with shell scripts, cron, and rsync.  I wanted to do something similar, but since my shell-foo is abysmal, I decided on python.

If this is useful to anyone else, I've shared my code.  The script has two modes controlled by arguments: backup and snapshot

  1. Optionally tries to mount a path which should be set up in /etc/fstab.  In my case, this is NFS.
  2. Mounts an encrypted filesystem at /mnt/.../current/
  3. Rsyncs a series of files and paths to /mnt/.../current/
  4. Optionally unmounts the encrypted filesytem.
Makes up to N periodic snapshots of the encrypted files at one of several frequencies.  For example, it might be configured to keep 24 hourly snapshots, 7 daily snapshots, 4 weekly snapshots, and 3 monthly snapshots.  Any number of snapshots can be kept at any frequency. 

The snapshots are taken of the encrypted files, not from the decrypted filesystem. As a result, you can run the snapshots directly on the remote backup system, I run it on my NAS. It works just fine if you run it locally as well.

Both modes are managed using a .backuprc file in the user's home directory.  For example, mine looks something like this:

# Optional, log all events
LOG_FILE /home/greg/logs/backup.log

# Optional, we try to mount this path first.  Failures halt execution.
PRE_MOUNT /mnt/backup/

# Required, password and mount point for encrypted/decrypted file 
# systems. The password can be in plaintext since this file is stored
# on an encrypted filesystem anyway.  We aren't going for paranoid.
ENCFS_PASSWORD AddYourOwnPasswordHere
ENCRYPTED_MOUNTPOINT /mnt/backup/desktop/
# This is where we will write files unencrypted.  Must be empty, must
# not be mounted already.
DECRYPTED_MOUNTPOINT /mnt/encryptedbackup/

# Required, rsync flags.
RSYNC_FLAGS -CRa --delete

# Number of snapshots.  Format: [type=,...] e.g. hourly=12,daily=7
SNAPSHOTS hourly=12,daily=7,weekly=2,monthly=1

# List of file paths to rsync.  Any line that doesn't contain a space is
# a file path.  Paths can be filenames or directories.  This is simply
# the argument passed to rsync.  As a result, you can use rsync features
# like adding a "./" directory to tell rsync which components of the
# path to sync over.
The python source, an example .backuprc and an example crontab are all found over here on github.

Some other helpful resources I came across while putting this together:
ReadyNas Root Access Add On
Install Python2.6 on a ReadyNas NV
RBackup - Diff based backups with python
The ultimate guide to rsync backups
How to set up encfs for use with rsync