Mar 5, 2011

Secure Snapshotted Backups with python, encfs, and rsync

I recently made a very dumb mistake and wiped out /home/greg/ on my personal desktop.  It wasn't a hardware failure, it was user error.  I had some manual backups, it wasn't catastrophic.  I had never really set up a good system though, and there were some annoying losses.  After restoring my sanity I decided it was time to set up better backups.  I run a NAS with ~3T of usable disk space and part of the reason for running this was backups.  My failure to set up a backup system was the only thing missing.

Obviously lots of people have solved this problem before, but my situation was unique enough that many options were off the table.  I had a few requirements:
  • My home directory (primaries) is encrypted with Ubuntu's TrueCrypt setup, the documents I want to back up the most are financial in nature, and so I wanted my backups always encrypted on disk.
  • I want snapshotted backups so that I was resilient against hardware failures, but also from rm -rf stupidity.
  • However, I did not want to simply store a series of diffs as that would make recovery more complex and I want recovery to be simple.
  • Still, I wanted efficiency and speed so I wasn't choking my internal network at various points in the day.
Most importantly though, I wanted to understand exactly how my backup system works and what it's doing.  Rather than trust some other code that I didn't understand and couldn't tweak, I wanted to roll my own.  Most folks do this with shell scripts, cron, and rsync.  I wanted to do something similar, but since my shell-foo is abysmal, I decided on python.

If this is useful to anyone else, I've shared my code.  The script has two modes controlled by arguments: backup and snapshot

Backup:
  1. Optionally tries to mount a path which should be set up in /etc/fstab.  In my case, this is NFS.
  2. Mounts an encrypted filesystem at /mnt/.../current/
  3. Rsyncs a series of files and paths to /mnt/.../current/
  4. Optionally unmounts the encrypted filesytem.
Snapshot:
Makes up to N periodic snapshots of the encrypted files at one of several frequencies.  For example, it might be configured to keep 24 hourly snapshots, 7 daily snapshots, 4 weekly snapshots, and 3 monthly snapshots.  Any number of snapshots can be kept at any frequency. 

The snapshots are taken of the encrypted files, not from the decrypted filesystem. As a result, you can run the snapshots directly on the remote backup system, I run it on my NAS. It works just fine if you run it locally as well.

Both modes are managed using a .backuprc file in the user's home directory.  For example, mine looks something like this:

# Optional, log all events
LOG_FILE /home/greg/logs/backup.log

# Optional, we try to mount this path first.  Failures halt execution.
PRE_MOUNT /mnt/backup/

# Required, password and mount point for encrypted/decrypted file 
# systems. The password can be in plaintext since this file is stored
# on an encrypted filesystem anyway.  We aren't going for paranoid.
ENCFS_PASSWORD AddYourOwnPasswordHere
ENCRYPTED_MOUNTPOINT /mnt/backup/desktop/
# This is where we will write files unencrypted.  Must be empty, must
# not be mounted already.
DECRYPTED_MOUNTPOINT /mnt/encryptedbackup/

# Required, rsync flags.
RSYNC_FLAGS -CRa --delete

# Number of snapshots.  Format: [type=,...] e.g. hourly=12,daily=7
SNAPSHOTS hourly=12,daily=7,weekly=2,monthly=1

# List of file paths to rsync.  Any line that doesn't contain a space is
# a file path.  Paths can be filenames or directories.  This is simply
# the argument passed to rsync.  As a result, you can use rsync features
# like adding a "./" directory to tell rsync which components of the
# path to sync over.
/home/greg/./.heartbeat
/home/greg/./src/
/home/greg/./financial/
/home/greg/./picasa/
The python source, an example .backuprc and an example crontab are all found over here on github.

Some other helpful resources I came across while putting this together:
ReadyNas Root Access Add On
Install Python2.6 on a ReadyNas NV
RBackup - Diff based backups with python
The ultimate guide to rsync backups
How to set up encfs for use with rsync

4 comments:

Felix said...

I had quite the same problem. 1) I wanted to snyc folders and settings between my home desktop pc, notebook, homeserver and office pc 2) I wanted to have a clean backup solution.

1)

Syncing folders: I used unison to sync folders between desktop pc, notebook, homeserver and office pc. In my local network I used samba and mounted the network folder as local folders (with samba + cifs). My office pc creates a SSH tunnel to my homeserver and syncs the files with unison. The sync runs automatically with a cron.

Syncing browser: I used the Google Chrome syncing feature.

Syncing eMails: Just using IMAP with Evolution as eMail client on each PC.

2)

Backing-Up: As my important folders are synced between 3 pcs anyway, I have some backups. Additionally I connected a external harddrive to my homeserver which does perodically backups (snapshots) as well (with rsync and cron jobs). My home folder on my office pc is also encrypted with truescript (just to be safe).

Bye,
Felix

My ressources (in German):
http://wiki.ubuntuusers.de/Unison
http://wiki.ubuntuusers.de/Samba_Client_GNOME
http://wiki.ubuntuusers.de/Samba_Client_cifs
http://wiki.ubuntuusers.de/fstab
http://wiki.ubuntuusers.de/rsync
http://wiki.ubuntuusers.de/Skripte/Backup_mit_RSYNC

Felix said...
This comment has been removed by the author.
Felix said...

P.S.: I love http://wiki.ubuntuusers.de/GNOME_Schedule
for cron jobs

Greg said...

Unison rocks but it seems more suited to syncing than backing up. I wanted to do backups only in one direction in this case so selected rsync as it seemed a little simpler, although really both roads seem reasonable choices in my case.