4475
Comment: converted to 1.6 markup
|
5386
|
Deletions are marked like this. | Additions are marked like this. |
Line 93: | Line 93: |
Currently, the minimum of volumes you should replicate are: {{{ common.bin common.databases common.logs old root.afs root.cell }}} = Remove AFS server = Here's a list of tasks that were done when we were removing Krunk: * Run '''vos listvol HOST''' to find existing volumes on the server. * Run '''vos remove -server HOST -id NAME|ID''' for each of them (note: really removes data! It's ok in case of replicated volumes whose r/w is kept elsewhere) * Run '''vos changeaddr -oldaddr HOST_IP -remove''' * Edit ''/etc/openafs/CellServDB'' on all machines to remove mention of HOST * Run '''bos shutdown krunk''' * Edit ''/afs/hcoop.net/common/etc/scripts/hcoop-kprop'' and remove mention of HOST, apply with '''DOMTOOL_USER=hcoop domtool hcoop.net''' * If the cell is published with grand.central.org, mail cellservdb@central.org and tell them the new CellServDB configuration |
These steps are listed in approximately the order in which they should be performed, after performing all of the "generic" steps in SetupNewMachines.
Update Existing Machines
Update AFSDB DNS Records
You'll want to add a new AFSDB record for the new server. Note that the numeric field in an AFSDB record must always be "1" -- it is not a priority like in MX records! The order of the records determines their priority (not like SRV records).
Update CellServDB on AFS Servers
On all existing AFS servers, add the IP address for the new machine to /etc/openafs/server/CellServDB (this should be a symlink to /etc/openafs/CellServDB but not vice-versa). The format of this file is very strange, and often confuses people:
A line starting with a ">" (greater-than sign) indicates the start of the declaration of the servers for a cell. The name of the cell comes after the greater-than.
- All lines between the previous line and the next line starting with a greater-than sign are servers for the previously mentioned cell. Each of these lines consists of an IP address, one or more tabs, a hash mark, and the hostname of the server.
Here is an example:
>hcoop.net 1.1.1.1 #afs1.hcoop.net 2.2.2.2 #afs2.hcoop.net >whitehouse.gov 0.0.0.0 #ovaloffice.whitehouse.gov
Restart All AFS Servers
Now, restart each of the existing AFS servers, one at a time, so they reload their CellServDB files. To completely ensure continuity of service, always wait a full five minutes after restarting one server before restarting the next one (five minutes is the worst-case time needed for AFS peer servers to "recognize" each other and rejoin the cluster; in practice the time required is usually much, much shorter).
Unfortunately this really is necessary.
Set Up New AFS Server
Ensure Hostname Resolves
Execute this command, and make sure it works. If it doesn't, the AFS server will fail cryptically and mysteriously.
dig +short `hostname`
Copy CellServDB, UserList, KeyFile, BosConfig, ThisCell
Copy the CellServDB, UserList, KeyFile, and BosConfig from an existing AFS server:
mkdir -p /etc/openafs/server/ scp deleuze.hcoop.net:/etc/openafs/server/UserList /etc/openafs/server/ scp deleuze.hcoop.net:/etc/openafs/server/KeyFile /etc/openafs/server/ chown root:wheel /etc/openafs/server/KeyFile chmod o-r /etc/openafs/server/KeyFile scp deleuze.hcoop.net:/etc/openafs/CellServDB /etc/openafs/CellServDB scp deleuze.hcoop.net:/etc/openafs/BosConfig /etc/openafs/BosConfig
Relink CellServDB and ThisCell
The AFS client and server (which can both be simultaneously installed on the same machine) keep their CellServDB's in different places, for historical reasons. We can simplify our setup by symlinking the server's to the client's (the reverse will not work due to restrictive permissions on /etc/openafs/server/):
mkdir -p /etc/openafs/server/ ln -sf /etc/openafs/CellServDB /etc/openafs/server/CellServDB ln -sf /etc/openafs/ThisCell /etc/openafs/server/ThisCell
Create /vicepa
The AFS server will store its files in /vicepa. So, you should create that directory, ensuring it resides on whatever storage (raid, etc) you want to use for AFS backing. Furthermore, you must let AFS know that it is safe to use it:
touch /vicepa/AlwaysAttach
Install Debian Packages
dpkg -i /afs/hcoop.net/common/debian/openafs/1.4.6/openafs-{fileserver,dbserver}*.deb
Replicate Volumes
We want most of our readonly volumes to be replicated as widely as possible. So, for each readonly volume, you should:
vos addsite newserver.hcoop.net /vicepa volname vos release volname
Currently, the minimum of volumes you should replicate are:
common.bin common.databases common.logs old root.afs root.cell
Remove AFS server
Here's a list of tasks that were done when we were removing Krunk:
Run vos listvol HOST to find existing volumes on the server.
Run vos remove -server HOST -id NAME|ID for each of them (note: really removes data! It's ok in case of replicated volumes whose r/w is kept elsewhere)
Run vos changeaddr -oldaddr HOST_IP -remove
Edit /etc/openafs/CellServDB on all machines to remove mention of HOST
Run bos shutdown krunk
Edit /afs/hcoop.net/common/etc/scripts/hcoop-kprop and remove mention of HOST, apply with DOMTOOL_USER=hcoop domtool hcoop.net
If the cell is published with grand.central.org, mail cellservdb@central.org and tell them the new CellServDB configuration
To Do
The information in CellServDB needs to stay in sync with the AFSDB DNS entries -- they both contain essentially exactly the same data in different formats. Unfortunately AFS can't be modified to "do away with" the CellServDB file because the AFS fileservers are supposed to be able to operate correctly even when DNS is down (clients are another story). So, it would be nice to have some way of generating the CellServDB from the AFSDB records periodically.