welcome: please sign in

Diff for "ToDo"

Differences between revisions 6 and 7
Revision 6 as of 2012-12-07 14:58:30
Size: 3276
Editor: ClintonEbadi
Comment: updating to reflect current realities
Revision 7 as of 2012-12-14 02:38:38
Size: 5751
Editor: ClintonEbadi
Comment: with the recent hardware failures, more aggressive plans to get redundancy in place
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
== Replacinging Mire == == Replacinging Old Machines ==

===
Mire ===
Line 11: Line 13:
Mire is being split into two machines: [[ServerNavajos|navajos]] and [[ServerBog|bog]]. Navajos will run apache/cgi programs, bog will be the general shell and daemon server. Mire is being split into two machines: [[ServerNavajos|navajos]] and [[ServerBog|bog]]. Navajos will run apache/cgi programs, bog will be the general shell and daemon server. The plan is to try our damndest to get everyone weened off of mire by the end of January 2013 so that we can free up rack space.
Line 15: Line 17:

=== Deleuze ===

Although still serving us well, Deleuze's Debian install is hopelessly obsolete, and its total horsepower equals about 1/8th of a modern low cost server.

The quickest way to replace it will be to acquire a new kvm server, and then spin up a pair of KernelVirtualMachine``s to serve the few remaining services on Deleuze:

 * base node (on the metal): AndrewFileSystem servers that can be made redunant (read only copies of common volumes, ptsserver, ...), possibly MitKerberos KDC slave
 * [[ServerMccarthy|mccarthy]]: domtool-server, general admin node (for e.g. building packages, creating users)
 * ''unnamed'': courier imap, exim

Deleuze is also performing dns services, but those can be moved to hopper with ease.

== New Server / Rack Cleanup ==

In order to remove `deleuze` on a reasonable timescale, we need a new server. Current thinking is that a Dell [[http://www.dell.com/us/enterprise/p/poweredge-r515/pd|PowerEdge R515]] configured with 2 6-core High Efficiency processors, 32GB RAM, and a 500G RAID1 will be acceptable. It's reasonably priced (~$2500, possibly lower when we acquire it), is decently faster than fritz, and is expandle with up to eight drives opening up a few possibilities:

 * SSD RAID1 for databases
 * Additional afs storage (rather than double space on fritz, distribute volumes between nodes to reduce the impact of server maintenance events)

If mire can be turned off in early February, then SteveKillen, BtTempleton, and ClintonEbadi are willing to take a road trip to the data center to perform a few tasks:

 * Remove old hardware from the rack (we have a bunch of dead drives and an [[KrunkInfoz|an entire machine]] sitting in the rack)
 * Replace the current cable mess with a new, improved cable mess
 * Re-rack current servers to eliminate wasted space (see space map on [[OnSiteVisits/20100105]])
   * Can we move the Belkin KVM so that it does not block a rack space? Maybe rack it upside down, leaving 1U of space above for the switch and ipkvm?
   * Ensure deleuze and hopper will be easy to remove, since deleuze will be going this year and hopper probably next
 * Rack new server

Then, we would drive the servers back to north carolina, destroy the hard drives, and recycle mire/krunk.
Line 24: Line 56:
 * January 2013: Get new member shell / daemon VM online (see FritzVirtualization)  * December 2012: Get new member shell / daemon VM online (see FritzVirtualization)
Line 28: Line 60:
 * Spring 2013: Kill mire  * February 2013: Turn mire off Feb 1
 * February/March 2013: Acquire install Power``Edge R515
Line 30: Line 63:
 * Fall 2013: Acquire more disk space

This page is meant to be an aid to our bug-tracker, in the case where several tasks are woven together via dependencies. Within a section, tasks are listed in order of which needs to be done first.

If you'd like to help, just join and email the administrators mailing list! No offer of free labor shall be refused.

Replacinging Old Machines

Mire

Mire is a crufty old ... mire of a machine. See FritzVirtualization for details on its replacement.

Mire is being split into two machines: navajos and bog. Navajos will run apache/cgi programs, bog will be the general shell and daemon server. The plan is to try our damndest to get everyone weened off of mire by the end of January 2013 so that we can free up rack space.

Status
New web server kvm has been installed, hcoop web services are being migrated before opening to members. After getting a few services moved over and ensuring the firewall request system works properly, bog will be spun up as a bare member shell server.
ETA

December 2012

Deleuze

Although still serving us well, Deleuze's Debian install is hopelessly obsolete, and its total horsepower equals about 1/8th of a modern low cost server.

The quickest way to replace it will be to acquire a new kvm server, and then spin up a pair of KernelVirtualMachines to serve the few remaining services on Deleuze:

  • base node (on the metal): AndrewFileSystem servers that can be made redunant (read only copies of common volumes, ptsserver, ...), possibly MitKerberos KDC slave

  • mccarthy: domtool-server, general admin node (for e.g. building packages, creating users)

  • unnamed: courier imap, exim

Deleuze is also performing dns services, but those can be moved to hopper with ease.

New Server / Rack Cleanup

In order to remove deleuze on a reasonable timescale, we need a new server. Current thinking is that a Dell PowerEdge R515 configured with 2 6-core High Efficiency processors, 32GB RAM, and a 500G RAID1 will be acceptable. It's reasonably priced (~$2500, possibly lower when we acquire it), is decently faster than fritz, and is expandle with up to eight drives opening up a few possibilities:

  • SSD RAID1 for databases
  • Additional afs storage (rather than double space on fritz, distribute volumes between nodes to reduce the impact of server maintenance events)

If mire can be turned off in early February, then SteveKillen, BtTempleton, and ClintonEbadi are willing to take a road trip to the data center to perform a few tasks:

  • Remove old hardware from the rack (we have a bunch of dead drives and an an entire machine sitting in the rack)

  • Replace the current cable mess with a new, improved cable mess
  • Re-rack current servers to eliminate wasted space (see space map on OnSiteVisits/20100105)

    • Can we move the Belkin KVM so that it does not block a rack space? Maybe rack it upside down, leaving 1U of space above for the switch and ipkvm?
    • Ensure deleuze and hopper will be easy to remove, since deleuze will be going this year and hopper probably next
  • Rack new server

Then, we would drive the servers back to north carolina, destroy the hard drives, and recycle mire/krunk.

Immediate Tasks

  • ASAP Find new admins

    • ClintonEbadi is the only active admin (DavorOcelic still assists with emergency tasks and processing of permissions requests -- ClintonEbadi 2012-09-04 05:28:39). This is problematic because ClintonEbadi is only one person, can fake being a sysadmin most days but has holes in his knowledge, and the coop would be screwed should his bicycle and a bus meet.

    • It's hard to become an HCoop sysadmin, but a lot of work has been done in 2012 to make it easier... if you're interested please mail the administrators list! Anyone with experience administering exim and courier imap would be extremely helpful as of September 2012.
  • ASAP Work out physical access situation

    • RichardDarst officially resigned all duties and decided to leave the coop. This leaves us with no one to perform maintenance at the data center. There are technicians on-site, but they can't do more than accept packages and minor tasks (rebooting a machine, swapping a failed hard drive) and are expensive. In order to expand we'll need to find someone who can go on-site for major maintenance events (new drives, new servers, removing old machines, etc.).

  • December 2012: Get new web services VM online (see FritzVirtualization)

  • December 2012: Get new member shell / daemon VM online (see FritzVirtualization)

Long Term

  • February 2013: Turn mire off Feb 1
  • February/March 2013: Acquire install PowerEdge R515

  • Summer 2013: Kill deleuze
  • Spring/Summer 2013: Acquire IPMI Console
  • Fall 2013: Acquire another 8-core monstrosity (KVM server, redundantly serving everything on fritz?)

Neglected tasks

  • BackupInfo situation really, really sucks. I'd use stronger language if children weren't around.

  • Services need updating to latest versions (lots of config merging and testing):
    • Exim
    • IMAP (courier authdaemon needs [or needed] a patch to work properly, converting to Dovecot has its own set of difficulties)
    • Mailman (not clear how to have mailman stuff generated on a machine other than the mail delivery node)

Tasks which non-admins can do, too

See HcoopVolunteerTasks for a list of things non-admins can help with.


CategorySystemAdministration

ToDo (last edited 2014-04-18 13:36:00 by Sajith)