| Size: 1710 Comment: Add vos release statement when unmounting volume | Size: 5676 Comment: let's not make full dumps every day | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 1: | Line 1: | 
| This page describes the procedure for accessing and using our off-site backups. Only admins can do this -- if you want to get some file or directory back from the dead and are not an admin, please contact the hcoop-sysadmin list for assistance. | This page describes the procedure for accessing and using our off-site backups. Only admins can do this -- if you want to get some file or directory back from the dead and are not an admin, please [[https://bugzilla.hcoop.net/enter_bug.cgi|open a Bugzilla bug]]. | 
| Line 3: | Line 3: | 
| == Backups of AFS Volumes == | {{{#!wiki note The backup/restore procedure below is being replaced with [[http://liw.fi/obnam/|obnam]], a backup manager that can perform incremental backups while simultaneously keeping the backup encrypted. }}} | 
| Line 5: | Line 7: | 
| === Getting access === | <<TableOfContents>> = Managing Backups = {{{#!wiki warning The backup manager script is currently broken and needs to be rewritten on top of obnam. }}} == Navigating the available backups == Using backup-manager: | 
| Line 8: | Line 20: | 
| ssh FOO_admin@deleuze.hcoop.net aklog -c megacz.com | backup-manager list backup-manager list YYYY.MM.DD | 
| Line 13: | Line 24: | 
| === Navigating the available backups === | == Retrieving a backup == (NOTE: $VOLNAME is not simply username, it is <db|mail|user>.USERNAME) Using backup-manager: | 
| Line 16: | Line 31: | 
| cd /afs/megacz.com/hcoop-backup/ cd $DESIRED_BACKUP_DATE | backup-manager get YYYY.MM.DD $VOLNAME.dump.gz.aescrypt | 
| Line 21: | Line 34: | 
| === Restoring the volume dump to a volume with a new name === | == Restoring the volume dump to a volume with a new name == Using backup-manager: | 
| Line 24: | Line 39: | 
| cat $VOLNAME.dump.gz.aescrypt | \ | backup-manager restore YYYY.MM.DD $VOLNAME.dump.gz.aescrypt $VOLNAME.restored }}} Manually: {{{ cat /vicepa/hcoop-backups/restored/YYYY.MM.DD-$VOLNAME.dump.gz.aescrypt | \ | 
| Line 30: | Line 51: | 
| === Mounting the newly restored volume onto the filesystem === | == Mounting the newly restored volume onto the filesystem == | 
| Line 37: | Line 58: | 
| === Restoring a particular file === | == Restoring a particular file == | 
| Line 43: | Line 64: | 
| === Unmounting the restored volume === | == Unmounting the restored volume == | 
| Line 50: | Line 71: | 
| === Renaming the restored volume so it takes the place of the damaged/corrupted/erased volume === | == Renaming the restored volume so it takes the place of the damaged/corrupted/erased volume == | 
| Line 59: | Line 80: | 
| === Removing the restored volume === | == Removing the restored volume == | 
| Line 67: | Line 88: | 
| == Database Backups == | = Database Backups = | 
| Line 70: | Line 91: | 
| cat databases.tar.gz.aescrypt | \ | cd /vicepa/hcoop-backups/restored mkdir YYYY.MM.DD-db cd YYYY.MM.DD-db cat ../YYYY.MM.DD-databases.tar.gz.aescrypt | \ | 
| Line 75: | Line 99: | 
| = Implementation = == Requirements == * Encrypted backups * Incremental backups * Plays nicely with AFS Thus, obnam. Things that might seem unobvious for anyone setting it up: * afs backup volumes should be vos dumped (despite space waste locally) and backed up as a whole unit so that ACLs are preserved in the case of restoration == Configuration == There is an encrypted obnam repository at ''path'' on rsync.net. We are only using one repository for now, and only backing up afs. Non-human user `hcoopbackup` has a gpg key and ssh key that allows it to access the obnam repository and rsync.net, respectively. Admins should generate individual gpg keys for accessing the backup. The script `/afs/hcoop.net/common/etc/scripts/hcoop-obnam-backup` is run on an openafs file server machine from `cron.daily` as root (it must be a file server, since we use `-localauth` when dumping). The backup node has a keytab `/etc/keytabs/hcoopbackup` that allows the backup script to become the backup user. First the entire system is backed up using `vos backupsys`. Then all backup volumes are dumped to `/backups/hcoop-backups/dumps`. Then the backup user takes over, and does an incremental backup to the remote repository at rsync.net. On the first day of each week, make a full volume dump, afterward recording the time of the dump (or creation of the backup volume, whichever afs needs). On the following days of the week, make an incremental dump from the weekly full dump to the current date. Then, obnam can be used to access a dump for any given date and we only need to mount two dump files. == Future Plans == To make life simpler, individual machines would be backed up to afs volumes (`{recovery|backup}.$machine.hcoop.net`?). Probably as uncompressed tarballs, using an adaptation of the current system backup scripts to only copy files that cannot be restored automatically by setting the status file and running apt. Should each machine have a unique backup user (what happens if you have a `$user/$host.hcoop.net` key? Can it authenticate on all nodes via pam_krb5, or just on `$host`?)? Databases could be backed up similarly, just by rsyncing over the `/srv/databases` directory. Same issues as previous backups with possibly bad file system state could arise however... if it's not ''impossible'', we should do a proper database dump for each database separately. Perhaps we could "resurrect" `$user.db`, but instead of `$user.dbbackup` containing the last snapshot of the user's database dumps? Possibly not worth the effort / it might be better to keep them unexposed to the world at large (violation of database acls). spamassassin should probably be writing its bayes database directly to afs anyway, so we can punt on that. There should not be any other local state. In a world where every child gets a free pony for their tenth birthday, we could teach obnam about afs acls and just mount the `$user.backup` volumes (i.e. not double the local space requirements for volumes!) and backup from those with the intermediate dump. This would also allow users to control what data gets backed up via ACLs. However, reality bites. ---- CategorySystemAdministration | 
This page describes the procedure for accessing and using our off-site backups. Only admins can do this -- if you want to get some file or directory back from the dead and are not an admin, please open a Bugzilla bug.
The backup/restore procedure below is being replaced with obnam, a backup manager that can perform incremental backups while simultaneously keeping the backup encrypted.
Contents
- 
Managing Backups- Navigating the available backups
- Retrieving a backup
- Restoring the volume dump to a volume with a new name
- Mounting the newly restored volume onto the filesystem
- Restoring a particular file
- Unmounting the restored volume
- Renaming the restored volume so it takes the place of the damaged/corrupted/erased volume
- Removing the restored volume
 
- Database Backups
- Implementation
1. Managing Backups
The backup manager script is currently broken and needs to be rewritten on top of obnam.
1.1. Navigating the available backups
Using backup-manager:
backup-manager list backup-manager list YYYY.MM.DD
1.2. Retrieving a backup
(NOTE: $VOLNAME is not simply username, it is <db|mail|user>.USERNAME)
Using backup-manager:
backup-manager get YYYY.MM.DD $VOLNAME.dump.gz.aescrypt
1.3. Restoring the volume dump to a volume with a new name
Using backup-manager:
backup-manager restore YYYY.MM.DD $VOLNAME.dump.gz.aescrypt $VOLNAME.restored
Manually:
cat /vicepa/hcoop-backups/restored/YYYY.MM.DD-$VOLNAME.dump.gz.aescrypt | \ ccrypt -cdk /etc/backup-encryption-key | \ gunzip | \ vos restore deleuze /vicepa $VOLNAME.restored
1.4. Mounting the newly restored volume onto the filesystem
fs mkm /afs/hcoop.net/.old/tmp-mount $VOLNAME.restored vos release old
1.5. Restoring a particular file
# examine /afs/hcoop.net/.old/tmp-mount
1.6. Unmounting the restored volume
fs rm /afs/hcoop.net/.old/tmp-mount vos release old
1.7. Renaming the restored volume so it takes the place of the damaged/corrupted/erased volume
Do this if you want to restore an entire volume. This deletes the old volume and replaces it with the backup.
vos remove $VOLNAME vos rename $VOLNAME.restored $VOLNAME
1.8. Removing the restored volume
If you only wanted to restore a few files from the volume, you should remove the local copy of the backup volume when done.
vos remove -id $VOLNAME.restored
2. Database Backups
cd /vicepa/hcoop-backups/restored mkdir YYYY.MM.DD-db cd YYYY.MM.DD-db cat ../YYYY.MM.DD-databases.tar.gz.aescrypt | \ ccrypt -cdk /etc/backup-encryption-key | \ gunzip | \ tar -xvzf -
3. Implementation
3.1. Requirements
- Encrypted backups
- Incremental backups
- Plays nicely with AFS
Thus, obnam. Things that might seem unobvious for anyone setting it up:
- afs backup volumes should be vos dumped (despite space waste locally) and backed up as a whole unit so that ACLs are preserved in the case of restoration
3.2. Configuration
There is an encrypted obnam repository at path on rsync.net. We are only using one repository for now, and only backing up afs.
Non-human user hcoopbackup has a gpg key and ssh key that allows it to access the obnam repository and rsync.net, respectively. Admins should generate individual gpg keys for accessing the backup.
The script /afs/hcoop.net/common/etc/scripts/hcoop-obnam-backup is run on an openafs file server machine from cron.daily as root (it must be a file server, since we use -localauth when dumping). The backup node has a keytab /etc/keytabs/hcoopbackup that allows the backup script to become the backup user.
First the entire system is backed up using vos backupsys. Then all backup volumes are dumped to /backups/hcoop-backups/dumps. Then the backup user takes over, and does an incremental backup to the remote repository at rsync.net.
On the first day of each week, make a full volume dump, afterward recording the time of the dump (or creation of the backup volume, whichever afs needs). On the following days of the week, make an incremental dump from the weekly full dump to the current date. Then, obnam can be used to access a dump for any given date and we only need to mount two dump files.
3.3. Future Plans
To make life simpler, individual machines would be backed up to afs volumes ({recovery|backup}.$machine.hcoop.net?). Probably as uncompressed tarballs, using an adaptation of the current system backup scripts to only copy files that cannot be restored automatically by setting the status file and running apt. Should each machine have a unique backup user (what happens if you have a $user/$host.hcoop.net key? Can it authenticate on all nodes via pam_krb5, or just on $host?)?
Databases could be backed up similarly, just by rsyncing over the /srv/databases directory. Same issues as previous backups with possibly bad file system state could arise however... if it's not impossible, we should do a proper database dump for each database separately. Perhaps we could "resurrect" $user.db, but instead of $user.dbbackup containing the last snapshot of the user's database dumps? Possibly not worth the effort / it might be better to keep them unexposed to the world at large (violation of database acls).
spamassassin should probably be writing its bayes database directly to afs anyway, so we can punt on that. There should not be any other local state.
In a world where every child gets a free pony for their tenth birthday, we could teach obnam about afs acls and just mount the $user.backup volumes (i.e. not double the local space requirements for volumes!) and backup from those with the intermediate dump. This would also allow users to control what data gets backed up via ACLs. However, reality bites.
