1. Core Software Choices
1.1. Virtualization
Virtualization would allow us to avoid having to dedicate an entire physical machine to the KDC/AFS server. It would also allow us to snapshot and migrate VM instances between machines in the future if needed. OpenVZ at least allows VM images to be suspended, migrated to another physical machine, and resumed with no apparant interuption to userspace (aside from network connections and such potentially timing out). This kind of flexibility would make future expansion a lot less painful.
2. General Setup
2.1. Core Services Machine
- Base operating system should just be Debian setup either as a Xen or OpenVZ server
- Things which logically belong in separate machines go into VM images
- KDC/AFS (and nothing else except perhaps LDAP)
- Core Network Services
- Domtool
- Portal
- Bugzilla
HCoop MoinMoin
- DNS
- SFTP (if we want to continue supporting it)
- Mail delivery
- Still into AFS space? At the very least users should be permitted to directly access their Maildir somehow
Note: if we continue to use procmail users can run program on this machine; procmail should run in a restricted shell with access to a few external programs useful for mail filtering but nothing else
- Databases
- Dedicated partition on the smaller array for database storage (potentially with its on RAID1 in the far off future?)
2.2. User Services Machine
- Also a Xen/OpenVZ server
- VM Images
- Secondary KDC
- Do we need to have a secondary AFS server with ro copies of user volumes? Or at least some core volumes all machines need?
- Web serving
- Should we continue to use Apache? I know it would involve rewriting the domtool Apache modules, but it doesn't seem like we use Apache more than for static file serving, url rewriting, and proxying. All of which can be done with a smaller server that will probably be easier to maintain (e.g. see our current mysterious issues that have defied all debugging)
- Should users have direct access to this image? Perhaps we could either write a small config utility or extend domtool to enable running programs automatically in the image; users could then configure their daemons on the general user image. I can see a few issues with controlling the remote daemons, but maybe we can work this out (perhaps using runit)
- General user access
- Users ssh here and run whatever
- Either just a general use shell server or combined with the web serving image
- IMAP/Jabber
- If we choose to not deliver mail into AFS space at least IMAP will need to go onto the core machine; Jabber is lightweight and does not present a security risk so it can just go wherever IMAP does
- Secondary KDC
3. Actually Doing It
So then, let's get started
3.1. Force Password Change
- Get hopper online as the secondary KDC if possible
- Choose a flag day and regenerate our master krbtgt
- All services will need to be restarted
- We should probably force a password change for all admin principles immediately after this
- Setup migratepw script
- This will also mark users for volume migration to the new afs server
- Announce to users that they should run this
- Set another flag day for the actual password changes (1 week)
- Roll Call vote to make sure everyone is paying attention
- Contact any stragglers
- If someone is utterly unresponsive we can migrate their volumes last, but freeze their accounts except for email on the new machines
3.2. Offline setup of new machines
After having the new machines shipped and getting the info we need from the datacenter.
- Setup console server
- Setup servers
- Configure BMCs to work on the management lan and enable serial-over-lan
- Install new hard drives
- Install Debian and configure minimally for further setup online
- Configure drives
- Software RAID1
Partition (need info from DavorOcelic)
- Set public IP addresses
- Create temporary local admin user with sudo privs
- Configure drives
3.3. Initial Setup at the colo
On the first day we have access to the datacenter RobinTempleton, ClintonEbadi, and SteveKillen (who will be joining immediately after migration completes -- for the initial racking he has offered his lifting ability for racking the heavy machines) will head out early and rack everything.
- Setup OpenVZ on both servers
- Configure DNS bits
- Configure initial OpenAFS server + KDC container
- Join current Kerberos Realm / AFS Cell
- Configure OpenAFS and Keberos clients on second server
- Create core services image with just a domtool server?
3.4. Migrating Users: Act I
- Setup mail delivery container
- Setup IMAP container
- Switch mail.hcoop.net over to the new mail container
- How do we handle delivery / imap for volumes not yet migrated?
Gradually migrate user mail volumes (< 1 week ideally)
- Should not interfere with other setup
- For each user in turn:
- Freeze mail volume
- clone onto new AFS server
- make the new afs server clone the master
- Magic reconfiguration bits
Jabber can probably be swapped over at this point too (it is a quick 15 task and uses no resources -- it could just live in same container as IMAP).
3.5. Configuring Everything Else
- Build databases container
- Use local disk on new server
- Migrate core shared services
- Bugzilla
- Wiki
- Mailman
- Portal?
- Build user accessible container
- Apache and whatnot
- Ideally we would start off with fwtool enable here
3.6. Migrating Users: Act II
- Make new KDC master and slave peer1 to it
- Gradually migrate user volumes to new machines
- Disable account on mire
- Freeze user volume
- clone onto new openafs fileserver
- Make clone rw master
- Remove user from deleuze domtool
- Add user to new server domtool
- Reconfigure domains
- For anyone using the Easy_domain stuff this should be transparant
- Anyone doing something special should know they are? We can then offer help to anyone asks with temporarily reconfiguring things to the temporary nsN names etc.
3.7. Parting is Such Sweet Sorrow
- Forcibly migrate and freeze anyone who has not responded to notices and direct email or other contact
- Leave email enable and make unfreeze automatic (ssh to new server and have the login shell run a pwchange + unfreeze?)
- If someone doesn't notice that something major just happened at this point ...
- Ensure that we can take the Peer1 stuff offline
- Turn off remaining services at Peer1
- Flip any final DNS bits etc.
- After ensuring that we are functioning without the Peer1 machines power them down and remove them
- Figure out what to do with the old machines