welcome: please sign in

Diff for "BackupInfo"

Differences between revisions 26 and 27
Revision 26 as of 2013-01-09 08:09:37
Size: 6197
Editor: ClintonEbadi
Comment: updated plans for new backup system, somewhat simpler
Revision 27 as of 2013-01-11 22:50:22
Size: 5309
Editor: ClintonEbadi
Comment: working on the backup system
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
The backup/restore procedure below is slated to be replaced with [[http://liw.fi/obnam/|obnam]], a backup manager that can perform incremental backups while simultaneously keeping the backup encrypted. The backup/restore procedure below is being replaced with [[http://liw.fi/obnam/|obnam]], a backup manager that can perform incremental backups while simultaneously keeping the backup encrypted.
Line 9: Line 9:
= Backups of AFS Volumes = = Managing Backups =

{{{#!wiki warning
The backup manager script is currently broken and needs to be rewritten on top of obnam.
}}}
Line 97: Line 101:
= Proposal for New Backup System =

by -- ClintonEbadi <<DateTime(2012-09-04T00:24:25-0400)>>

The current backup system has a serious deficiency in that it does a full volume backup every few days. This is untenable; we use ~4Mbit/s out of a 5Mbit/s allocation each month just for backups! More than ~150 members and we're toast. It also doesn't backup the local system data of any machines other than deleuze!
= Implementation =
Line 113: Line 113:
== Basic setup == == Configuration ==
Line 115: Line 115:
Every day, There is an encrypted obnam repository at ''path'' on rsync.net. We are only using one repository for now, and only backing up afs.
Line 117: Line 117:
 * Pull system / database backups
 * `vos backupsys` (remember to ensure all are mounted in `/afs/hcoop.net/old/` for member benefit, and `vos release` the volume)
 * dump all backup volumes
 * sync with remote obnam repository
Non-human user `hcoopbackup` has a gpg key and ssh key that allows it to access the obnam repository and rsync.net, respectively. Admins should generate individual gpg keys for accessing the backup.
Line 122: Line 119:
Bandwidth permitting, of course. Store as much as we can remotely; 30 daily backups and another month or two of weekly backups seems reasonably sane. The script `/afs/hcoop.net/common/etc/scripts/hcoop-obnam-backup` is run on an openafs file server machine from `cron.daily` as root (it must be a file server, since we use `-localauth` when dumping). The backup node has a keytab `/etc/keytabs/hcoopbackup` that allows the backup script to become the backup user.
Line 124: Line 121:
To secure the repository, we need a backup afs user (possibly exists already) that can read a passwordless gpg key (and ssh key for logging into rsync.net) securely stored in afs. The backup user will need a keytab, but it needs to be propagated to only the needed machines (perhaps time to clean up our keytab syncing regime in general?). Each active admin should generate a gpg key specifically for the backup repository (good for a limited time, perhaps one year) that will then be added so they can access backups on behalf of members. When admins depart, their keys can easily be revoked, unless they were malicious and stole the obnam key. First the entire system is backed up using `vos backupsys`. Then all backup volumes are dumped to `/vicepa/hcoop-backups/dumps`. Then the backup user takes over, and does an incremental backup to the remote repository at rsync.net.
Line 126: Line 123:
Initially, just live with backing up user volumes (far, '''far''' better than nothing, and we have no disaster recovery plan for using system backups anyway so their value initially is dubious, since they've effectively never been supported). Back up databases first, and then local systems (user data is more important -- once we rid ourselves of deleuze in theory machines should have no unique data of importance).

== Incremental implementation ==
== Future Plans ==

This page describes the procedure for accessing and using our off-site backups. Only admins can do this -- if you want to get some file or directory back from the dead and are not an admin, please open a Bugzilla bug.

The backup/restore procedure below is being replaced with obnam, a backup manager that can perform incremental backups while simultaneously keeping the backup encrypted.

1. Managing Backups

The backup manager script is currently broken and needs to be rewritten on top of obnam.

Using backup-manager:

backup-manager list
backup-manager list YYYY.MM.DD

1.2. Retrieving a backup

(NOTE: $VOLNAME is not simply username, it is <db|mail|user>.USERNAME)

Using backup-manager:

backup-manager get YYYY.MM.DD $VOLNAME.dump.gz.aescrypt

1.3. Restoring the volume dump to a volume with a new name

Using backup-manager:

backup-manager restore YYYY.MM.DD $VOLNAME.dump.gz.aescrypt $VOLNAME.restored

Manually:

cat /vicepa/hcoop-backups/restored/YYYY.MM.DD-$VOLNAME.dump.gz.aescrypt | \
  ccrypt -cdk /etc/backup-encryption-key | \
  gunzip | \
  vos restore deleuze /vicepa $VOLNAME.restored

1.4. Mounting the newly restored volume onto the filesystem

fs mkm /afs/hcoop.net/.old/tmp-mount $VOLNAME.restored
vos release old

1.5. Restoring a particular file

# examine /afs/hcoop.net/.old/tmp-mount

1.6. Unmounting the restored volume

fs rm /afs/hcoop.net/.old/tmp-mount
vos release old

1.7. Renaming the restored volume so it takes the place of the damaged/corrupted/erased volume

Do this if you want to restore an entire volume. This deletes the old volume and replaces it with the backup.

vos remove $VOLNAME
vos rename $VOLNAME.restored $VOLNAME

1.8. Removing the restored volume

If you only wanted to restore a few files from the volume, you should remove the local copy of the backup volume when done.

vos remove -id $VOLNAME.restored

2. Database Backups

cd /vicepa/hcoop-backups/restored
mkdir YYYY.MM.DD-db
cd YYYY.MM.DD-db
cat ../YYYY.MM.DD-databases.tar.gz.aescrypt | \
  ccrypt -cdk /etc/backup-encryption-key | \
  gunzip | \
  tar -xvzf -

3. Implementation

3.1. Requirements

  • Encrypted backups
  • Incremental backups
  • Plays nicely with AFS

Thus, obnam. Things that might seem unobvious for anyone setting it up:

  • afs backup volumes should be vos dumped (despite space waste locally) and backed up as a whole unit so that ACLs are preserved in the case of restoration

3.2. Configuration

There is an encrypted obnam repository at path on rsync.net. We are only using one repository for now, and only backing up afs.

Non-human user hcoopbackup has a gpg key and ssh key that allows it to access the obnam repository and rsync.net, respectively. Admins should generate individual gpg keys for accessing the backup.

The script /afs/hcoop.net/common/etc/scripts/hcoop-obnam-backup is run on an openafs file server machine from cron.daily as root (it must be a file server, since we use -localauth when dumping). The backup node has a keytab /etc/keytabs/hcoopbackup that allows the backup script to become the backup user.

First the entire system is backed up using vos backupsys. Then all backup volumes are dumped to /vicepa/hcoop-backups/dumps. Then the backup user takes over, and does an incremental backup to the remote repository at rsync.net.

3.3. Future Plans

To make life simpler, individual machines would be backed up to afs volumes ({recovery|backup}.$machine.hcoop.net?). Probably as uncompressed tarballs, using an adaptation of the current system backup scripts to only copy files that cannot be restored automatically by setting the status file and running apt. Should each machine have a unique backup user (what happens if you have a $user/$host.hcoop.net key? Can it authenticate on all nodes via pam_krb5, or just on $host?)?

Databases could be backed up similarly, just by rsyncing over the /srv/databases directory. Same issues as previous backups with possibly bad file system state could arise however... if it's not impossible, we should do a proper database dump for each database separately. Perhaps we could "resurrect" $user.db, but instead of $user.dbbackup containing the last snapshot of the user's database dumps? Possibly not worth the effort / it might be better to keep them unexposed to the world at large (violation of database acls).

spamassassin should probably be writing its bayes database directly to afs anyway, so we can punt on that. There should not be any other local state.

In a world where every child gets a free pony for their tenth birthday, we could teach obnam about afs acls and just mount the $user.backup volumes (i.e. not double the local space requirements for volumes!) and backup from those with the intermediate dump. This would also allow users to control what data gets backed up via ACLs. However, reality bites.


CategorySystemAdministration

BackupInfo (last edited 2019-03-31 19:34:13 by ClintonEbadi)