Implementation

1. Requirements

Thus, obnam. Things that might seem unobvious for anyone setting it up:

2. Configuration

There is an encrypted obnam repository at path on rsync.net. We are only using one repository for now, and only backing up afs.

Non-human user hcoopbackup has a gpg key and ssh key that allows it to access the obnam repository and rsync.net, respectively. Admins should generate individual gpg keys for accessing the backup.

The script /afs/hcoop.net/common/etc/scripts/hcoop-obnam-backup is run on an openafs file server machine from cron.daily as root (it must be a file server, since we use -localauth when dumping). The backup node has a keytab /etc/keytabs/hcoopbackup that allows the backup script to become the backup user.

First the entire system is backed up using vos backupsys. Then all backup volumes are dumped to /backups/hcoop-backups/dumps. Then the backup user takes over, and does an incremental backup to the remote repository at rsync.net.

On the first day of each week, make a full volume dump, afterward recording the time of the dump (or creation of the backup volume, whichever afs needs). On the following days of the week, make an incremental dump from the weekly full dump to the current date. Then, obnam can be used to access a dump for any given date and we only need to mount two dump files.

3. Future Plans

To make life simpler, individual machines would be backed up to afs volumes ({recovery|backup}.$machine.hcoop.net?). Probably as uncompressed tarballs, using an adaptation of the current system backup scripts to only copy files that cannot be restored automatically by setting the status file and running apt. Should each machine have a unique backup user (what happens if you have a $user/$host.hcoop.net key? Can it authenticate on all nodes via pam_krb5, or just on $host?)?

Databases could be backed up similarly, just by rsyncing over the /srv/databases directory. Same issues as previous backups with possibly bad file system state could arise however... if it's not impossible, we should do a proper database dump for each database separately. Perhaps we could "resurrect" $user.db, but instead of $user.dbbackup containing the last snapshot of the user's database dumps? Possibly not worth the effort / it might be better to keep them unexposed to the world at large (violation of database acls).

spamassassin should probably be writing its bayes database directly to afs anyway, so we can punt on that. There should not be any other local state.

In a world where every child gets a free pony for their tenth birthday, we could teach obnam about afs acls and just mount the $user.backup volumes (i.e. not double the local space requirements for volumes!) and backup from those with the intermediate dump. This would also allow users to control what data gets backed up via ACLs. However, reality bites.


CategorySystemAdministration CategoryNeedsWork