welcome: please sign in

Diff for "ToDo"

Differences between revisions 8 and 9
Revision 8 as of 2012-12-14 08:14:46
Size: 6176
Editor: ClintonEbadi
Comment: a picture of what we're up against with the rack re-org to strike fear into the hearts of all
Revision 9 as of 2013-07-27 22:35:41
Size: 3420
Editor: ClintonEbadi
Comment: remove completed todo items, change timelines for others, add a few new long term tasks
Deletions are marked like this. Additions are marked like this.
Line 9: Line 9:
=== Mire ===

Mire is a crufty old ... mire of a machine. See FritzVirtualization for details on its replacement.

Mire is being split into two machines: [[ServerNavajos|navajos]] and [[ServerBog|bog]]. Navajos will run apache/cgi programs, bog will be the general shell and daemon server. The plan is to try our damndest to get everyone weened off of mire by the end of January 2013 so that we can free up rack space.

 Status:: New web server kvm has been installed, hcoop web services are being migrated before opening to members. After getting a few services moved over and ensuring the firewall request system works properly, bog will be spun up as a bare member shell server.
 ETA:: ''December 2012''
Line 28: Line 20:
Deleuze is also performing dns services, but those can be moved to hopper with ease. Things deleuze does: (incomplete, probably)

 * Generated webalizer pages
 * Portal hosting
 * domtool-server
 * Mail delivery and filtering (exim, exim filters, procmail)
 * Mail access (courier imap/pop)
 * Mailman
   * List delivery/archiving (stored locally!)
   * Web serving of list archives/management interfaces
 * Web serving hcoop.net
   * Cannot easily convert to domtool config because of `mod_userdir`
 * Squirrelmail hosting
 * AndrewFileSystem servers
   * bos, vos, maybe others.
Line 32: Line 38:
In order to remove `deleuze` on a reasonable timescale, we need a new server. Current thinking is that a Dell [[http://www.dell.com/us/enterprise/p/poweredge-r515/pd|PowerEdge R515]] configured with 2 6-core High Efficiency processors, 32GB RAM, and a 500G RAID1 will be acceptable. It's reasonably priced (~$2500, possibly lower when we acquire it), is decently faster than fritz, and is expandle with up to eight drives opening up a few possibilities:

 * SSD RAID1 for databases
 * Additional afs storage (rather than double space on fritz, distribute volumes between nodes to reduce the impact of server maintenance events)

If mire can be turned off in early February, then SteveKillen, BtTempleton, and ClintonEbadi are willing to take a road trip to the data center to perform a few tasks:

 * Remove old hardware from the rack (we have a bunch of dead drives and an [[KrunkInfoz|an entire machine]] sitting in the rack)
 * Replace the current cable mess with a new, improved cable mess
   * Idea: use a different color ethernet cable for each server (3'? 5'?)
   * One big problem is the plethora of kvm cables that need to be tamed. Investigate some kind of cable management gadget.
   * Bring lots and lots of cable bundling velcro tape
 * Re-rack current servers to eliminate wasted space (see space map on [[OnSiteVisits/20100105]])
   * Can we move the Belkin KVM so that it does not block a rack space? Maybe rack it upside down, leaving 1U of space above for the switch and ipkvm?
   * Ensure deleuze and hopper will be easy to remove, since deleuze will be going this year and hopper probably next
 * Rack new server

Then, we would drive the servers back to north carolina, destroy the hard drives, and recycle mire/krunk.

The most recent picture of the back of the rack shows what we're fighting:

{{http://hcoop.net/~rkd/hcoop/imgs-2010-01-05/rkd-20100105-191558.jpg|mess|width=500}}
See NewServerDiscussion2013. We also completed some cleanup during [[OnSiteVisits/20130626]] and [[OnSiteVisits/20130627]].
Line 59: Line 44:
   * SrikanthSastry has volunteered to do front line support and handle OnSiteVisits.
Line 60: Line 46:
 * '''ASAP''' Work out physical access situation
   * RichardDarst officially resigned all duties and decided to leave the coop. This leaves us with no one to perform maintenance at the data center. There ''are'' technicians on-site, but they can't do more than accept packages and minor tasks (rebooting a machine, swapping a failed hard drive) and are expensive. In order to expand we'll need to find someone who can go on-site for major maintenance events (new drives, new servers, removing old machines, etc.).
 * December 2012: Get new web services VM online (see FritzVirtualization)
 * December 2012: Get new member shell / daemon VM online (see FritzVirtualization)
Line 67: Line 50:
 * February 2013: Turn mire off Feb 1
 * February/March 2013: Acquire install Power``Edge R515
 * Summer 2013: Kill deleuze
 * Spring/Summer 2013: Acquire IPMI Console
 * Fall 2013: Acquire another 8-core monstrosity (KVM server, redundantly serving everything on fritz?)
 * Summer/Fall 2013: Acquire install Power``Edge R515
 * Fall/Winter 2013: Kill deleuze
 * Winter 2013: Upgrade to AndrewFileSystem 1.6, rekey cell
 * Spring 2014: Upgrade member services to wheezy

This page is meant to be an aid to our bug-tracker, in the case where several tasks are woven together via dependencies. Within a section, tasks are listed in order of which needs to be done first.

If you'd like to help, just join and email the administrators mailing list! No offer of free labor shall be refused.

Replacinging Old Machines

Deleuze

Although still serving us well, Deleuze's Debian install is hopelessly obsolete, and its total horsepower equals about 1/8th of a modern low cost server.

The quickest way to replace it will be to acquire a new kvm server, and then spin up a pair of KernelVirtualMachines to serve the few remaining services on Deleuze:

  • base node (on the metal): AndrewFileSystem servers that can be made redunant (read only copies of common volumes, ptsserver, ...), possibly MitKerberos KDC slave

  • mccarthy: domtool-server, general admin node (for e.g. building packages, creating users)

  • unnamed: courier imap, exim

Things deleuze does: (incomplete, probably)

  • Generated webalizer pages
  • Portal hosting
  • domtool-server
  • Mail delivery and filtering (exim, exim filters, procmail)
  • Mail access (courier imap/pop)
  • Mailman
    • List delivery/archiving (stored locally!)
    • Web serving of list archives/management interfaces
  • Web serving hcoop.net
    • Cannot easily convert to domtool config because of mod_userdir

  • Squirrelmail hosting
  • AndrewFileSystem servers

    • bos, vos, maybe others.

New Server / Rack Cleanup

See NewServerDiscussion2013. We also completed some cleanup during OnSiteVisits/20130626 and OnSiteVisits/20130627.

Immediate Tasks

  • ASAP Find new admins

    • ClintonEbadi is the only active admin (DavorOcelic still assists with emergency tasks and processing of permissions requests -- ClintonEbadi 2012-09-04 05:28:39). This is problematic because ClintonEbadi is only one person, can fake being a sysadmin most days but has holes in his knowledge, and the coop would be screwed should his bicycle and a bus meet.

    • SrikanthSastry has volunteered to do front line support and handle OnSiteVisits.

    • It's hard to become an HCoop sysadmin, but a lot of work has been done in 2012 to make it easier... if you're interested please mail the administrators list! Anyone with experience administering exim and courier imap would be extremely helpful as of September 2012.

Long Term

  • Summer/Fall 2013: Acquire install PowerEdge R515

  • Fall/Winter 2013: Kill deleuze
  • Winter 2013: Upgrade to AndrewFileSystem 1.6, rekey cell

  • Spring 2014: Upgrade member services to wheezy

Neglected tasks

  • BackupInfo situation really, really sucks. I'd use stronger language if children weren't around.

  • Services need updating to latest versions (lots of config merging and testing):
    • Exim
    • IMAP (courier authdaemon needs [or needed] a patch to work properly, converting to Dovecot has its own set of difficulties)
    • Mailman (not clear how to have mailman stuff generated on a machine other than the mail delivery node)

Tasks which non-admins can do, too

See HcoopVolunteerTasks for a list of things non-admins can help with.


CategorySystemAdministration

ToDo (last edited 2014-04-18 13:36:00 by Sajith)