welcome: please sign in

Diff for "SetupNewMachines"

Differences between revisions 13 and 63 (spanning 50 versions)
Revision 13 as of 2008-03-02 20:09:42
Size: 4820
Editor: dhcp-37-70
Comment:
Revision 63 as of 2008-03-10 01:01:45
Size: 12680
Editor: 77
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
This is just a rough sketch for now; we will expand it soon.

These steps are listed in approximately the order in which they should be performed; please try to maintain that.

=== Out Of Band Access ===
#pragma section-numbers off

These steps are listed in approximately the order in which they should be performed; please try to maintain that as you add to it.

[[TableOfContents]]


=== List the Machine on the Wiki ===

Add the machine to the ["Hardware"] page.

It is a very good idea to photograph the front and back panels of the machine and put those images on the wiki page; that way remote admins and people in the data center can be sure they're talking about the same ports.

=== Set Up Out Of Band Access ===
Line 13: Line 22:
Functions 1+2 are typically provided by kvm.hcoop.net; assuming you plan on going with that, you should connect the server's keyboard and video to the kvm switch. Functions 1+2 are typically provided by {{{kvm.hcoop.net}}} (see KvmAccess); assuming you plan on going with that, you should connect the server's keyboard and video to the kvm switch.
Line 23: Line 32:
We use Debian. Install it.

=== Kernel Compilation ===
We use Debian. Install it.  We should put our standard {{{/etc/apt/sources.list}}} here.

=== Compile a Kernel ===
Line 29: Line 38:
=== AFS Client ===

You should install the {{{openafs-client}}}, {{{openafs-krb5}}}, {{{openafs-modules-source}}}, {{{openafs-dbg}}}, {{{openafs-doc}}}, {{{libopenafs-dev}}}, {{{fsr}}}, and {{{libpam-afs-session}}} packages from {{{/afs/hcoop.net/common/debian/}}}. Here is a block of commands to cut and paste if you are lazy:

{{{
  apt-get install libpam-krb5 krb5-user libkrb5-dev
  cd
/tmp
=== Install the AFS Client ===

First, give our preferences to {{{debconf}}}:

{{{
  debconf openafs-client/thiscell hcoop.net
  debconf openafs-client/dynroot true
  debconf openafs-client/cachesize 500000 # cache size in kB; default is way too small
}}}

You should install the {{{module-assistant}}}, {{{build-essential}}}, {{{module-init-tools}}}, {{{openafs-client}}}, {{{openafs-krb5}}}, {{{openafs-modules-source}}}, {{{openafs-dbg}}}, {{{openafs-doc}}}, {{{libopenafs-dev}}}, and {{{kstart}}} packages. Here is a block of commands to cut and paste if you are lazy:

{{{
  apt-get install krb5-user libkrb5-dev module-assistant build-essential module-init-tools kstart
  mkdir -p
/tmp/openafs-packages
  cd /tmp/openafs-packages
Line 37: Line 55:
  scp ssh.hcoop.net:/afs/hcoop.net/common/debian/libpam-afs-session/\*.deb ./
  scp ssh.hcoop.net:/afs/hcoop.net/common/debian/fsr/\*.deb ./
Line 45: Line 61:
    libopenafs-dev*.deb \
    fsr*.deb \
    libpam-afs-session*.deb
}}}

Also be sure to

{{{
  apt-get install module-assistant build-essential module-init-tools
    libopenafs-dev*.deb
  cd /tmp
  rm -rf /tmp/openafs-packages
Line 76: Line 86:
Do this and check that /afs shows up.

=== runit ===

The runit package is useful for launching and monitoring daemons with '''clean process state'''. This is often important when dealing with tokens and pags.

=== dnscache ===

You can install the dnscache package to make the server self-sufficient for dns resolution purposes (it acts as a tiny dns server just for localhost). This improves the reliability of the overall infrastructure.

There is a copy of this package in /afs/megacz.com/debian/; the author of the software recently changed its license, so it will be a standard package in the next release of debian (it may even be in etch-backports already).

Starting dnscache via runit is often a good idea; this ensures that it starts early in the boot process and that it is restarted if it dies for any reason.

=== /etc/krb5.conf ===

You should copy /etc/krb5.conf from deleuze to the new server. This is VERY IMPORTANT. What is NOT in this file is also almost as important as what IS in this file, so think three times before adding or removing anything.

=== configuring pam ===

FIXME

Mostly this consists of copying mire's /etc/pam.d/*, although it would be a good idea to state precisely which parts of that need to be copied.

=== configuring ssh to get tokens+tickets ===
Do this and check that {{{/afs}}} shows up.

=== Install Packages ===

Now that afs is up, you can easily install packages. The block of commands below installs the set of packages which must be on every hcoop server (this list will be expanded as necessary).

{{{
  dpkg -i /afs/hcoop.net/common/debian/libnss-ptdb/*.deb
  dpkg -i /afs/hcoop.net/common/debian/libnss-afspag/*.deb
  dpkg -i /afs/hcoop.net/common/debian/libpam-afs-session/*.deb
  dpkg -i /afs/hcoop.net/common/debian/libpam-krb5/*.deb
  dpkg -i /afs/hcoop.net/common/debian/fsr/*.deb
}}}

The first three packages are explained below; the last one is the {{{fsr}}} command (recursive "{{{fs}}}").

=== Install Network Time Protocol Daemon ===

Kerberos and AFS will not work correctly unless the clocks of the client and server are synchronized to within a certain tolerance. Therefore, it is important for us to have a daemon running that keeps the clock set properly.

{{{
  apt-get install ntp
}}}

=== Install LDAP Support ===

Logins etc. will not work correctly unless libpam-ldap is installed and configured:

{{{
  apt-get install libpam-ldap
}}}

Debconf answers:

{{{
libpam-ldap shared/ldapns/base-dn string dc=hcoop,dc=net
libpam-ldap shared/ldapns/ldap-server string ldap://69.90.123.67/
libpam-ldap libpam-ldap/pam_password select exop
libpam-ldap libpam-ldap/rootbinddn string cn=admin,dc=hcoop,dc=net
libpam-ldap libpam-ldap/dbrootlogin boolean true
libpam-ldap libpam-ldap/override boolean true
libpam-ldap shared/ldapns/ldap_version select 3
libpam-ldap libpam-ldap/dblogin boolean false
}}}

You will also need to know LDAP admin password; see /etc/pam_ldap.secret on one of existing servers
and re-type the password into the password prompt.

=== Configure Kerberos ===

'''''VERY IMPORTANT''''': put exactly the following in {{{/etc/krb5.conf}}} -- no more, no less

{{{
[libdefaults]
 default_realm = HCOOP.NET
 kdc_timesync = 1
 forwardable = true
 proxiable = true
[logging]
        default = FILE:/proc/self/fd/2
}}}

We distribute our Kerberos configuration via DNS, so it is very important that we do not "hardwire" the settings on any of the servers (except the KDCs themselves). If we did, we wouldn't notice at first, but strange problems would crop up as soon as the DNS settings were changed. So, it is important that we put only the bare minimum amount of information in {{{krb5.conf}}}.


=== Configure Name Service ===

A "name service" is Linux's mechanism for answering these queries:

 1. the userid for a given username and vice versa
 2. the groupid for a given groupname and vice versa
 3. the home directory for a user
 4. the shell for a user
 5. what groups a user is in

The {{{libnss-ptdb}}} package lets linux use the AFS user database (the {{{ptserver}}} or protection server) as a name service. The {{{libnss-afspag}}} package makes PAGs show up as a special group. To enable these changes, edit {{{/etc/nsswitch.conf}}} and change the {{{passwd}}} and {{{group}}} lines to look like this:

{{{
passwd: ptdb files
group: afspag files
}}}

=== Install Name Service Caching Daemon ===

Our version of {{{libnss-ptdb}}} is configured to do no caching. Therefore, to get acceptable performance, we need to run {{{nscd}}}.

{{{
  apt-get install nscd
}}}

=== Configure PAM ===

PAM is Linux's mechanism to do the following:

 1. decide if somebody is who they say they are (authentication; in our case via kerberos)
 2. set up ''sessions'' (in the case of AFS, this means creating PAGs)
 3. change passwords (in our case, changing the password in the KDC)

Here's the usual PAM setup:

/etc/pam.d/common-account:

{{{
account sufficient pam_unix.so
account required pam_ldap.so
account required pam_krb5.so debug

# temporary line for emergencies
#account required pam_unix.so

account required pam_access.so
account requisite pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed
}}}

/etc/pam.d/common-auth:

{{{
auth sufficient pam_krb5.so debug forwardable ignore_root
auth optional pam_afs_session.so program=/usr/bin/aklog debug
auth required pam_unix.so nullok_secure try_first_pass use_authtok

# temporary line for emergencies
#auth required pam_unix.so nullok_secure

auth required pam_env.so
auth requisite pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed
}}}

/etc/pam.d/common-password:

{{{
password sufficient pam_krb5.so
password required pam_unix.so nullok obscure min=4 max=8 md5 shadow try_first_pass
}}}

/etc/pam.d/common-session:

{{{
session requisite pam_limits.so
session required pam_unix_session.so # Unix module just logs access
session optional pam_krb5.so
session optional pam_afs_session.so program=/usr/bin/aklog debug
}}}

/etc/pam.d/ssh

{{{
# sshd is stupid and does not consult the "auth" section of pam
# when GssapiAuthentication=yes, even if UsePAM=yes. Retarded.
# Therefore, we add the check to the "account" section as well.
account requisite pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed
auth requisite pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed
}}}

If the machine is intended for user logins, DO NOT create /etc/login.restrict. If the machine is only
intended for admin logins, then create the file /etc/login.restrict with the following contents:

{{{
adamc_admin
docelic_admin
megacz_admin
mwolson_admin
ntk_admin
}}}

=== Configure SSH ===

==== Configure SSH Client ====

Insert these lines in {{{/etc/ssh/ssh_config}}} so that ''outbound'' ssh connections will always try to use Kerberos if available:

{{{
  Host *
    GSSAPIAuthentication yes
    GSSAPIDelegateCredentials no
}}}

==== Configure SSH Server ====
Line 108: Line 272:
Add this principal to the KDC like this (execute these commands on the new server, as root, while holding admin tokens):

{{{
   kadmin -r HCOOP.NET
     
ank -randkey host/server.hcoop.net@HCOOP.NET
     
ktadd -k /etc/krb5.keytab
     quit
Add this principal to the KDC like this (execute these commands on the new server, as root, while holding admin tickets):

{{{
   rm -f /etc/krb5.keytab # important -- if it already exists the new key will merely be appended
   
kadmin -r HCOOP.NET -q 'ank -randkey host/server.hcoop.net@HCOOP.NET'
   kadmin -r HCOOP.NET -q '
ktadd -k /etc/krb5.keytab host/server.hcoop.net@HCOOP.NET'
Line 119: Line 282:
Then these lines to /etc/ssh/sshd_config: Then these lines to {{{/etc/ssh/sshd_config}}}:
Line 127: Line 290:

=== Populate sudoers ===

Don't forget to give all of the admins lines in {{{/etc/sudoers}}}.

=== Optional Steps ===

==== Performance-Tune the OpenAFS Client ====

FIXME: AdamM needs to fill this in

==== runit ====
The runit package is a mechanism for starting, stopping, and monitoring daemons. It is an alternative to the traditional {{{/etc/init.d}}} and {{{start-stop-daemon}}} scheme. Its chief advantages are:

  1. It launches daemons with '''clean process state'''; the daemon inherits nothing from the administrator invoking the start/stop command because the daemon is not forked as a child of the administrator's shell (rather, a request is sent {{{runit}}} daemon asking it to fork the daemon). This is very important when dealing with tokens and pags.
  2. Runit monitors the processes that it forks, and restarts them if they die.
  3. Runit eliminates the need for pidfiles and the associated risk of starting multiple copies of a daemon.
  4. Runit captures the daemon's {{{stdout}}} and either sends it to a logger (if specified) or else displays it in the process name (output of {{{ps}}})

{{{
   apt-get install runit
}}}
When you move a process from {{{/etc/init.d/}}} control to {{{runit}}} supervision, you should inform debian that you have done so:
{{{
  # assuming /var/service/$SERVICE/run is the runit script
  dpkg-divert --rename /etc/init.d/$SERVICE
  ln -s /usr/bin/sv /etc/init.d/$SERVICE
}}}
This will cause invocations of {{{/etc/init.d/script {start|stop} }}} to do "the right thing".

==== dnscache ====

You can install the dnscache package to make the server self-sufficient for dns resolution purposes (it acts as a tiny dns server just for localhost). This improves the reliability of the overall infrastructure. There is a copy of this package in {{{/afs/megacz.com/debian/dnscache/}}}; the author of the software recently changed its license, so it will be a standard package in the next release of debian (it may even be in etch-backports already; when it is, this paragraph should be updated to recommend that instead).

Starting dnscache via runit is often a good idea; this ensures that it starts early in the boot process and that it is restarted if it dies for any reason.

==== ssmtp ====

Life is simpler when you run {{{ssmtp}}}. You can direct the mail stream either to {{{deleuze}}} (preferred) or to a copy of {{{exim}}} running locally (but why bother running it?).

These steps are listed in approximately the order in which they should be performed; please try to maintain that as you add to it.

TableOfContents

List the Machine on the Wiki

Add the machine to the ["Hardware"] page.

It is a very good idea to photograph the front and back panels of the machine and put those images on the wiki page; that way remote admins and people in the data center can be sure they're talking about the same ports.

Set Up Out Of Band Access

All machines owned by hcoop should, if possible, have some out-of-band mechanism for:

  1. Keyboard access
  2. Screen access
  3. Power-cycling

Functions 1+2 are typically provided by kvm.hcoop.net (see KvmAccess); assuming you plan on going with that, you should connect the server's keyboard and video to the kvm switch.

Each server has its own solution for 3, usually in the form of a "service processor". You should investigate and document the appropriate service processor settings. If the service processor requires its own IP address, you should name it foo-sp.hcoop.net where foo.hcoop.net is the name of the server.

Add a DNS entry for the server

Straightforward.

Install Debian

We use Debian. Install it. We should put our standard /etc/apt/sources.list here.

Compile a Kernel

It is generally a good idea for hcoop to compile its own kernels. Regarding statically-compiled kernels, see StaticallyCompiledKernels for some opinions.

Install the AFS Client

First, give our preferences to debconf:

  debconf openafs-client/thiscell  hcoop.net
  debconf openafs-client/dynroot   true
  debconf openafs-client/cachesize 500000      # cache size in kB; default is way too small

You should install the module-assistant, build-essential, module-init-tools, openafs-client, openafs-krb5, openafs-modules-source, openafs-dbg, openafs-doc, libopenafs-dev, and kstart packages. Here is a block of commands to cut and paste if you are lazy:

  apt-get install krb5-user libkrb5-dev module-assistant build-essential module-init-tools kstart
  mkdir -p /tmp/openafs-packages
  cd /tmp/openafs-packages
  scp ssh.hcoop.net:/afs/hcoop.net/common/debian/openafs/1.4.6/\*.deb      ./
  dpkg -i \
    openafs-client*.deb         \
    openafs-krb5*.deb           \
    openafs-modules-source*.deb \
    openafs-dbg*.deb            \
    openafs-doc*.deb            \
    libopenafs-dev*.deb         
  cd /tmp
  rm -rf /tmp/openafs-packages

Once these packages are installed, you will want to run

  module-assistant a-i -t openafs-modules

... assuming you compiled your own kernel and the compiled kernel tree resides in /usr/src/linux. If this is not the case, you are on your own.

If the command above completes, it will have created and installed a .deb containing the kernel module. You may need to run

  /etc/init.d/module-init-tools start

to refresh whatever module wonkery linux maintains in obscure locations. Once this is figured out (if all else fails, reboot) you should be able to

  /etc/init.d/openafs-client start

Do this and check that /afs shows up.

Install Packages

Now that afs is up, you can easily install packages. The block of commands below installs the set of packages which must be on every hcoop server (this list will be expanded as necessary).

  dpkg -i /afs/hcoop.net/common/debian/libnss-ptdb/*.deb
  dpkg -i /afs/hcoop.net/common/debian/libnss-afspag/*.deb
  dpkg -i /afs/hcoop.net/common/debian/libpam-afs-session/*.deb
  dpkg -i /afs/hcoop.net/common/debian/libpam-krb5/*.deb
  dpkg -i /afs/hcoop.net/common/debian/fsr/*.deb

The first three packages are explained below; the last one is the fsr command (recursive "fs").

Install Network Time Protocol Daemon

Kerberos and AFS will not work correctly unless the clocks of the client and server are synchronized to within a certain tolerance. Therefore, it is important for us to have a daemon running that keeps the clock set properly.

  apt-get install ntp

Install LDAP Support

Logins etc. will not work correctly unless libpam-ldap is installed and configured:

  apt-get install libpam-ldap

Debconf answers:

libpam-ldap     shared/ldapns/base-dn   string  dc=hcoop,dc=net
libpam-ldap     shared/ldapns/ldap-server       string  ldap://69.90.123.67/
libpam-ldap     libpam-ldap/pam_password        select  exop
libpam-ldap     libpam-ldap/rootbinddn  string  cn=admin,dc=hcoop,dc=net
libpam-ldap     libpam-ldap/dbrootlogin boolean true
libpam-ldap     libpam-ldap/override    boolean true
libpam-ldap     shared/ldapns/ldap_version      select  3
libpam-ldap     libpam-ldap/dblogin     boolean false

You will also need to know LDAP admin password; see /etc/pam_ldap.secret on one of existing servers and re-type the password into the password prompt.

Configure Kerberos

VERY IMPORTANT: put exactly the following in /etc/krb5.conf -- no more, no less

[libdefaults]
        default_realm = HCOOP.NET
        kdc_timesync = 1
        forwardable = true
        proxiable = true
[logging]
        default = FILE:/proc/self/fd/2

We distribute our Kerberos configuration via DNS, so it is very important that we do not "hardwire" the settings on any of the servers (except the KDCs themselves). If we did, we wouldn't notice at first, but strange problems would crop up as soon as the DNS settings were changed. So, it is important that we put only the bare minimum amount of information in krb5.conf.

Configure Name Service

A "name service" is Linux's mechanism for answering these queries:

  1. the userid for a given username and vice versa
  2. the groupid for a given groupname and vice versa
  3. the home directory for a user
  4. the shell for a user
  5. what groups a user is in

The libnss-ptdb package lets linux use the AFS user database (the ptserver or protection server) as a name service. The libnss-afspag package makes PAGs show up as a special group. To enable these changes, edit /etc/nsswitch.conf and change the passwd and group lines to look like this:

passwd:  ptdb   files
group:   afspag files

Install Name Service Caching Daemon

Our version of libnss-ptdb is configured to do no caching. Therefore, to get acceptable performance, we need to run nscd.

  apt-get install nscd

Configure PAM

PAM is Linux's mechanism to do the following:

  1. decide if somebody is who they say they are (authentication; in our case via kerberos)
  2. set up sessions (in the case of AFS, this means creating PAGs)

  3. change passwords (in our case, changing the password in the KDC)

Here's the usual PAM setup:

/etc/pam.d/common-account:

account sufficient        pam_unix.so
account required        pam_ldap.so
account required        pam_krb5.so debug

# temporary line for emergencies
#account required        pam_unix.so

account required pam_access.so
account    requisite    pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed

/etc/pam.d/common-auth:

auth    sufficient        pam_krb5.so debug forwardable ignore_root
auth    optional          pam_afs_session.so program=/usr/bin/aklog debug
auth    required          pam_unix.so nullok_secure try_first_pass use_authtok

# temporary line for emergencies
#auth    required          pam_unix.so nullok_secure

auth    required          pam_env.so
auth       requisite    pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed

/etc/pam.d/common-password:

password sufficient pam_krb5.so 
password required   pam_unix.so nullok obscure min=4 max=8 md5 shadow try_first_pass

/etc/pam.d/common-session:

session requisite pam_limits.so
session required  pam_unix_session.so      # Unix module just logs access
session optional  pam_krb5.so
session optional  pam_afs_session.so program=/usr/bin/aklog debug

/etc/pam.d/ssh

# sshd is stupid and does not consult the "auth" section of pam
# when GssapiAuthentication=yes, even if UsePAM=yes.  Retarded.
# Therefore, we add the check to the "account" section as well.
account    requisite    pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed
auth       requisite    pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed

If the machine is intended for user logins, DO NOT create /etc/login.restrict. If the machine is only intended for admin logins, then create the file /etc/login.restrict with the following contents:

adamc_admin
docelic_admin
megacz_admin
mwolson_admin
ntk_admin

Configure SSH

Configure SSH Client

Insert these lines in /etc/ssh/ssh_config so that outbound ssh connections will always try to use Kerberos if available:

  Host *
    GSSAPIAuthentication yes
    GSSAPIDelegateCredentials no

Configure SSH Server

You will need to create a "host principal" for the new server; if you are setting up server.hcoop.net, then it must have the name

   host/server.hcoop.net@HCOOP.NET

Add this principal to the KDC like this (execute these commands on the new server, as root, while holding admin tickets):

   rm -f /etc/krb5.keytab   # important -- if it already exists the new key will merely be appended
   kadmin -r HCOOP.NET -q 'ank -randkey host/server.hcoop.net@HCOOP.NET'
   kadmin -r HCOOP.NET -q 'ktadd -k /etc/krb5.keytab host/server.hcoop.net@HCOOP.NET'
   chown root:root /etc/krb5.keytab
   chmod go-rwx /etc/krb5.keytab

Then these lines to /etc/ssh/sshd_config:

  GssapiKeyExchange yes
  GssapiAuthentication yes
  GSSAPICleanupCredentials no
  UsePAM yes

Populate sudoers

Don't forget to give all of the admins lines in /etc/sudoers.

Optional Steps

Performance-Tune the OpenAFS Client

FIXME: AdamM needs to fill this in

runit

The runit package is a mechanism for starting, stopping, and monitoring daemons. It is an alternative to the traditional /etc/init.d and start-stop-daemon scheme. Its chief advantages are:

  1. It launches daemons with clean process state; the daemon inherits nothing from the administrator invoking the start/stop command because the daemon is not forked as a child of the administrator's shell (rather, a request is sent runit daemon asking it to fork the daemon). This is very important when dealing with tokens and pags.

  2. Runit monitors the processes that it forks, and restarts them if they die.
  3. Runit eliminates the need for pidfiles and the associated risk of starting multiple copies of a daemon.
  4. Runit captures the daemon's stdout and either sends it to a logger (if specified) or else displays it in the process name (output of ps)

   apt-get install runit

When you move a process from /etc/init.d/ control to runit supervision, you should inform debian that you have done so:

  # assuming /var/service/$SERVICE/run is the runit script
  dpkg-divert --rename /etc/init.d/$SERVICE
  ln -s /usr/bin/sv /etc/init.d/$SERVICE

This will cause invocations of /etc/init.d/script {start|stop}  to do "the right thing".

dnscache

You can install the dnscache package to make the server self-sufficient for dns resolution purposes (it acts as a tiny dns server just for localhost). This improves the reliability of the overall infrastructure. There is a copy of this package in /afs/megacz.com/debian/dnscache/; the author of the software recently changed its license, so it will be a standard package in the next release of debian (it may even be in etch-backports already; when it is, this paragraph should be updated to recommend that instead).

Starting dnscache via runit is often a good idea; this ensures that it starts early in the boot process and that it is restarted if it dies for any reason.

ssmtp

Life is simpler when you run ssmtp. You can direct the mail stream either to deleuze (preferred) or to a copy of exim running locally (but why bother running it?).

SetupNewMachines (last edited 2012-12-20 21:13:00 by ClintonEbadi)