welcome: please sign in

Diff for "SetupNewMachines"

Differences between revisions 30 and 144 (spanning 114 versions)
Revision 30 as of 2008-03-02 20:33:09
Size: 7300
Editor: dhcp-37-70
Comment:
Revision 144 as of 2010-11-29 08:25:43
Size: 25167
Editor: DavorOcelic
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
These steps are listed in approximately the order in which they should be performed; please try to maintain that.

=== Set Up Out Of Band Access ===

All machines owned by hcoop should, if possible, have some out-of-band mechanism for:
#pragma section-numbers off

These steps are listed in approximately the order in which they should be performed; please try to maintain that as you add to it.

<<TableOfContents>>


= List the Machine on the Wiki =

The hostname of the machine should be decided through a Members Poll (accessible from members portal) such as [[https://members.hcoop.net/portal/poll?id=31]].

Add the machine to the [[Hardware]] page.

It is a very good idea to photograph the front and back panels of the machine and put those images on the wiki page; that way remote admins and people in the data center can be sure they're talking about the same ports.

Add the machine to the [[IpAddresses]] page.

= Set Up Out Of Band Access =

All machines owned by hcoop should, if possible, have some [[http://en.wikipedia.org/wiki/Out-of-band_infrastructure|out-of-band mechanism]] for:
Line 11: Line 26:
Functions 1+2 are typically provided by kvm.hcoop.net; assuming you plan on going with that, you should connect the server's keyboard and video to the kvm switch. Functions 1+2 are typically provided by {{{kvm.hcoop.net}}} (see KvmAccess); assuming you plan on going with that, you should connect the server's keyboard and video to the kvm switch.
Line 15: Line 30:
=== Add a DNS entry for the server ===

Straightforward.

=== Install Debian ===

We use Debian. Install it.

=== Compile a Kernel ===

It is generally a good idea for hcoop to compile its own kernels. Regarding statically-compiled kernels, see StaticallyCompiledKernels for some opinions.

=== Install the AFS Client ===

You should install the {{{module-assistant}}}, {{{build-essential}}}, {{{module-init-tools}}}, {{{openafs-client}}}, {{{openafs-krb5}}}, {{{openafs-modules-source}}}, {{{openafs-dbg}}}, {{{openafs-doc}}}, {{{libopenafs-dev}}}, packages from {{{/afs/hcoop.net/common/debian/}}}. Here is a block of commands to cut and paste if you are lazy:

{{{
  apt-get install krb5-user libkrb5-dev module-assistant build-essential module-init-tools
  mkdir -p /tmp/openafs-packages
  cd /tmp/openafs-packages
  scp ssh.hcoop.net:/afs/hcoop.net/common/debian/openafs/1.4.6/\*.deb ./
  dpkg -i \
If there's _anything_ server-specific, please add an entry under "Specific Machines" on page AdminArea and document what it is. Rebooting procedures are an ideal candidate for this.

= Add a DNS entry for the server =

This is done as follows:

 1. Have our DomTool2 CVS repository from SourceForge checked out (for this you need a SourceForge account of course, and write access to the "hcoop" project on SourceForge)
 1. Edit domtool2/lib/hcoop.dtl and add definition for "HOSTNAME_ip" (search for "deleuze_ip" and just copy the line to new name)
 1. TODO: How to recompile and install new domtool with HOSTNAME_ip defined
 1. Edit ''/afs/hcoop.net/user/h/hc/hcoop/.domtool/hcoop.net'' to add the new DNS entry, using HOSTNAME_ip (again, can use deleuze_ip as example)
 1. To apply DomTool configuration, run '''DOMTOOL_USER=hcoop domtool hcoop.net''' in the ~hcoop/.domtool/ directory

= Install Debian =

We use Debian GNU.

Here are the installation notes to help you:

 1. Find Debian stable image (whichever is 'stable' at time of installation)
 1. Prepare a USB stick to boot from (can do it manually or with convenient tool called "unetbootin")
 1. In system BIOS, choose 'Auto-power on on power restore' (if there is such option), and see if you can make USB stick to not be the first disk (when it's the first disk, it gets assigned device name /dev/sda and makes the installation a tiny bit harder)
 1. See which network card is in the server, if it requires non-free firmware, the package needs to be manually copied from Debian's non-free repository onto the install media (example is package "firmware-bnx2" for Broadcom NetXtremeII cards (http://packages.debian.org/sid/all/firmware-bnx2/download)). Once package is on the media, the install procedure will, if it is needed, automatically find and install it
 1. For timezone, use timezone where the server is physically located, and answer Yes to "Is the hardware clock set to GMT?"
 1. Choose manual network configuration, specifying the choosen hostname, IP and network details as listed on the IpAddresses page
 1. Partition disks. Most often, this comes town to creating identical partitions on all disks that are part of RAID1, and creating RAID arrays as inthe following example (can probably reuse it verbatim in your scenario): {{{

Example: 2x 160 GB system disks

System disk 1:

sda1: primary, beginning, 1 GB, ext3, /boot
sda2: primary, beginning, 8 GB, use as phys. volume for RAID (swap space: 1 GB x number of proc. cores)
sda3: primary, beginning, all available space, use as phys. volume for RAID

System disk 2:

sdb1: primary, beginning, 1 GB, ext3, unmounted
sdb2: primary, beginning, 8 GB, use as phys. volume for RAID (swap, same size as above)
sdb3: primary, beginning, all available space, use as phys. volume for RAID

Then, after RAID partitions have been assigned, new option "Configure RAID"
will appear at the top of the partitioning menu. We add the two devices in
RAID 1 mode:

md0: sda2 and sdb2
md1: sda3 and sdb3

Then, they appear in the partitions list and are configured as follows:

md0: swap
md1: ext3, /
}}}
 * As seen, /boot partition is not on RAID. This is intentional as /boot on RAID is problematic. But sda1 (/boot) will be synced onto sdb1 (by cping the files and running grub-install on it) periodically via cron, so that sdb1 can be used for booting too, in case sda fails. (TODO: Cron job/Puppet recipe to copy that does this)
 * Users & password setup: set root password and choose "No" at "Create regular user account?" prompt. If the installation does not let you continue without creating a regular user, create "root2" with the same password as root. The password should not be an official password, but a strong temporary string. (TODO: Puppet recipe that manages passwords and admin's SSH keys).
 * If /dev/sda is the USB stick and not the first disk, do not install GRUB to the Master Boot Record of /dev/sda. Instead, answer No at the prompt and choose /dev/sdb as the device. Then, take USB stick out, edit /boot/grub/menu.lst to replace references to hd(1,0) with hd(0,0), run '''update-grub.conf''' and '''grub-install /dev/sdb'''. No other tunings (to /etc/fstab or mdadm.conf) are needed as, if you used the partitioning example, no direct partitions occur in fstab, and for mdadm -- it uses UUIDs instead of partition names anyway.
 * In tasksel, at the end of installation, do not select any package category, not even "Standard system"

= Booting into the new machine =

When the machine boots for the first time, run: {{{

dpkg-reconfigure debconf # (choose interface: Dialog, priority: Low).

apt-get install less sudo vim etckeeper changetrack lm-sensors openssh-server debsums logcheck bzip2 denyhosts
}}}

Verify that disks performance is as expected using '''sync; sync; hdparm -tT /dev/sdX'''.

Activate etckeeper as documented on EtcKeeper.

Edit ''/etc/default/changetrack'' and set '''AUTO_TRACK_ALL_CONFFILES=yes'''.

Edit ''/etc/tripwire/twcfg.txt'' and set '''MAILNOVIOLATIONS =false'''. Initialize the database with '''tripwire --init'''. (If tripwire is installed)

Edit ''/etc/aliases'' and set "root" alias to "logs@hcoop.net", and possibly other addresses, separated by commas. (logs@ is an aliasMulti, defined in ~hcoop/.domtool/hcoop.net and lists people who want to receive verbose system logs).

Run '''sensors-detect''' to see if the kernel has appropriate thermal modules for the server, and add any drivers detected to ''/etc/modules''.

For all ext partitions, run '''tune2fs -j -c0 -i0 /dev/sdXX''' (and /dev/mdX for RAID arrays).

== Tune the /etc/apt/sources.list ==

{{{
cat > /etc/apt/sources.list <<\EOF
deb http://mirror.peer1.net/debian/ lenny main
deb-src http://mirror.peer1.net/debian/ lenny main

deb http://security.debian.org/ lenny/updates main
deb-src http://security.debian.org/ lenny/updates main

deb http://volatile.debian.org/debian-volatile lenny/volatile main
deb-src http://volatile.debian.org/debian-volatile lenny/volatile main
EOF

apt-get update
apt-get dist-upgrade
}}}

== Remove lame directories ==

{{{
sudo rm /cdrom
sudo rm /media/cdrom
sudo rm /media/floppy
sudo rmdir /media/cdrom[0-9]
sudo rmdir /media/floppy[0-9]
sudo rmdir /media
sudo rmdir /opt
}}}

= Compile a Kernel =

Here's an example for kernel 2.6.31.9, adjust accordingly: {{{

apt-get install make gcc patch bin86 kernel-package libncurses5-dev fakeroot

cd /usr/local/src
wget http://grsecurity.net/stable/grsecurity-2.1.14-2.6.31.9-200912191011.patch
wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.31.9.tar.bz2

tar jxf linux-2.6.31.9.tar.bz2
patch -p0 < grsecurity-2.1.14-2.6.31.9-200912191011.patch

cd linux-2.6.31.9
cp /some/existing/config .config
make oldconfig
make menuconfig # (if any manual tuning needed)

time CONCURRENCY_LEVEL=8 fakeroot make-kpkg --initrd kernel_image >& ../build.log
}}}

= Install the AFS Client =

The AFS client gets very unhappy if the partition holding {{{/var/cache/openafs}}} fills up. To ensure that this can't happen, we'll create a 2GB file and mount it there using the loopback device. This gives the openafs client a partition-in-a-file all to itself that no other process can interfere with.

First, create the file:
{{{
dd if=/dev/zero of=/var/cache/openafs.ext3 bs=1M count=2K
chmod go-rwx /var/cache/openafs.ext3
mke2fs -F /var/cache/openafs.ext3
tune2fs -j -i0 -c0 /var/cache/openafs.ext3
}}}

Then mount it. Note: we could mount it directly on {{{/var/cache/openafs}}}, but if we did that and for some reason it failed to mount, the openafs client would just write files into that directory anyways. We want to know immediately if the mount fails, so we'll make {{{/var/cache/openafs}}} a symlink to a subdirectory of the new partition.

{{{
mkdir /var/cache/openafs.mnt
echo -e '/var/cache/openafs.ext3\t/var/cache/openafs.mnt\text3\tloop\t1\t1' >> /etc/fstab
mount /var/cache/openafs.mnt/
mkdir -p /var/cache/openafs.mnt/cache/
rm -rf /var/cache/openafs
ln -s /var/cache/openafs.mnt/cache /var/cache/openafs
}}}

Then, give our preferences to {{{debconf}}}:

{{{
debconf-set-selections <<\EOF
openafs-client openafs-client/thiscell string hcoop.net
openafs-client openafs-client/thiscell seen true
openafs-client openafs-client/dynroot boolean true
openafs-client openafs-client/dynroot seen true
openafs-client openafs-client/cachesize string 500000
openafs-client openafs-client/cachesize seen true
openafs-client openafs-client/cell-info string
openafs-client openafs-client/cell-info seen true
openafs-client openafs-client/run-client boolean true
openafs-client openafs-client/run-client seen true
EOF
}}}

You should install the {{{module-assistant}}}, {{{build-essential}}}, {{{module-init-tools}}}, {{{openafs-client}}}, {{{openafs-krb5}}}, {{{openafs-modules-source}}}, {{{openafs-doc}}}, {{{libopenafs-dev}}}, and {{{kstart}}} packages. Here is a block of commands to cut and paste if you are lazy:

{{{
apt-get install krb5-user libkrb5-dev module-init-tools kstart sudo \
        module-assistant build-essential bison flex debhelper
mkdir -p /tmp/openafs-packages
cd /tmp/openafs-packages
scp ssh.hcoop.net:/afs/hcoop.net/common/debian/openafs/1.4.6/\*.deb ./
dpkg -i \
Line 40: Line 213:
    openafs-dbg*.deb \
Line 43: Line 215:
  cd /tmp
  rm -rf /tmp/openafs-packages
cd /tmp
rm -rf /tmp/openafs-packages
Line 58: Line 230:
  /etc/init.d/module-init-tools start depmod
/etc/init.d/module-init-tools start
Line 69: Line 242:
=== Install Packages === = Install Packages =
Line 74: Line 247:
  dpkg -i /afs/hcoop.net/common/debian/libnss-ptdb/*.deb
  dpkg -i /afs/hcoop.net/common/debian/libpam-afs-session/*.deb
  dpkg -i /afs/hcoop.net/common/debian/libpam-krb5/*.deb
  dpkg -i /afs/hcoop.net/common/debian/fsr/*.deb
}}}

The first three packages are explained below; the last one is the {{{fsr}}} command (recursive "{{{fs}}}").

=== Configure Kerberos ===

You should copy {{{/etc/krb5.conf}}} from deleuze to the new server. This is VERY IMPORTANT. What is NOT in this file is also almost as important as what IS in this file, so think three times before adding or removing anything.

=== Configure Name Service ===

The {{{libnss-ptdb}}} package lets linux use the AFS {{{ptserver}}} (protection server) as a name service. The {{{ptserver}}} keeps track of all the users in AFS. A "name service" is Linux's mechanism for answering these four queries:

 1. the userid for a given username
 2. the username for a userid
dpkg -i /afs/hcoop.net/user/m/me/megacz/public/libnss-afs/libnss-afs*.deb
dpkg -i /afs/hcoop.net/common/debian/libpam-afs-session/*.deb
dpkg -i /afs/hcoop.net/common/debian/libpam-krb5/*.deb
dpkg -i /afs/megacz.com/debian/fsr*.deb
dpkg -i /afs/megacz.com/debian/krb5-user/{krb5-user,libk}*.deb
}}}

The first three packages are explained below; the fourth one is the {{{fsr}}} command (recursive "{{{fs}}}"). The last line installs a fixed version of kadmin which understands DNS entries.

= Install Network Time Protocol Daemon =

Kerberos and AFS will not work correctly unless the clocks of the client and server are synchronized to within a certain tolerance. Therefore, it is important for us to have a daemon running that keeps the clock set properly. '''This step is not optional'''.

{{{
  apt-get install ntp
}}}

= Install LDAP Support =

Logins etc. will not work correctly unless libpam-ldap is installed and configured:

{{{
  apt-get install libpam-ldap
}}}

Debconf answers:

{{{
debconf-set-selections <<\EOF
libpam-ldap shared/ldapns/base-dn string dc=hcoop,dc=net
libpam-ldap shared/ldapns/ldap-server string ldap://69.90.123.67/
libpam-ldap libpam-ldap/pam_password select exop
libpam-ldap libpam-ldap/rootbinddn string cn=admin,dc=hcoop,dc=net
libpam-ldap libpam-ldap/dbrootlogin boolean true
libpam-ldap libpam-ldap/override boolean true
libpam-ldap shared/ldapns/ldap_version select 3
libpam-ldap libpam-ldap/dblogin boolean false
EOF
}}}

You will also need to know LDAP admin password; see /etc/pam_ldap.secret on one of existing servers
and re-type the password into the password prompt.

= Configure Kerberos =

'''''VERY IMPORTANT''''': put exactly the following in {{{/etc/krb5.conf}}} -- no more, no less

{{{
[libdefaults]
 default_realm = HCOOP.NET
 kdc_timesync = 1
 forwardable = true
 proxiable = true
 rdns = no # undocumented option to disable reverse DNS lookups
[logging]
        default = FILE:/proc/self/fd/2
}}}

We distribute our Kerberos configuration via DNS, so it is very important that we do not "hardwire" the settings on any of the servers (except the KDCs themselves). If we did, we wouldn't notice at first, but strange problems would crop up as soon as the DNS settings were changed. So, it is important that we put only the bare minimum amount of information in {{{krb5.conf}}}.


= Configure Name Service =

A "name service" is Linux's mechanism for answering these queries:

 1. the userid for a given username and vice versa
 2. the groupid for a given groupname and vice versa
Line 96: Line 318:
To make {{{ptserver}}} our primary choice for name service, edit {{{/etc/nsswitch.conf}}} and change the following three lines to look like this:

{{{
passwd: ptdb files
group: afspag files
shadow: files
}}}

=== Configure PAM ===

PAM is the mechanism used by Linux to do the following:
The {{{libnss-afs}}} package lets linux use the AFS user database (the {{{ptserver}}} or protection server) as a name service and makes PAGs show up as a special group. To enable these changes, edit {{{/etc/nsswitch.conf}}} and change the {{{passwd}}} and {{{group}}} lines to look like this:

{{{
passwd: afs files
group: afs files
shadow: files
}}}



= Install Name Service Caching Daemon =

It is highly recommended to install {{{nscd}}} in order to get good performance out of {{{libnss-afs}}}.

{{{
  apt-get install nscd
}}}

Unfortunately there is a grevious bug in the DNS caching mechanism in etch's nscd (see [[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=467609|this]]); so we must disable it until it is fixed. To do this, edit {{{/etc/nscd.conf}}} and change the line

{{{
sed -i 's_enable-cache.*hosts.*yes_enable-cache hosts no_' /etc/nscd.conf
}}}

We prefer to run nscd as a runit service so that it does not go down (except on deleuze, where it must be started strictly after AFS in the boot sequence).

{{{
  apt-get install runit
  mkdir /var/service/nscd
  cat <<EOF > /var/service/nscd/run
#!/bin/sh
exec nscd -d
EOF
  mkdir /var/service/nscd/log
  cat <<EOF > /var/service/nscd/log/run
#!/bin/bash
svlogd -tt /var/log/nscd/
EOF
  mkdir /var/log/nscd
  chmod +x /var/service/nscd/log/run
  chmod +x /var/service/nscd/run

  dpkg-divert --rename /etc/init.d/nscd
  ln -s /usr/bin/sv /etc/init.d/nscd
}}}

= Configure PAM =

PAM is Linux's mechanism to do the following:
Line 112: Line 372:
FIXME

Mostly this consists of copying mire's {{{/etc/pam.d/*}}}, although it would be a good idea to state precisely which parts of that need to be copied.

=== Configure SSH ===
Here's the usual PAM setup:

/etc/pam.d/common-account:

{{{
account sufficient pam_unix.so
account required pam_ldap.so
account required pam_krb5.so debug

# temporary line for emergencies
#account required pam_unix.so

account required pam_access.so
}}}

/etc/pam.d/common-auth:

{{{
auth sufficient pam_krb5.so debug forwardable ignore_root
auth optional pam_afs_session.so program=/usr/bin/aklog debug
auth required pam_unix.so nullok_secure try_first_pass

# temporary line for emergencies
#auth required pam_unix.so nullok_secure

auth required pam_env.so
}}}

/etc/pam.d/common-password:

{{{
password sufficient pam_krb5.so
password required pam_unix.so nullok obscure min=4 max=8 md5 shadow try_first_pass
}}}

/etc/pam.d/common-session:

{{{
session requisite pam_limits.so
session required pam_unix_session.so # Unix module just logs access
session optional pam_krb5.so
session optional pam_afs_session.so program=/usr/bin/aklog debug
}}}

/etc/pam.d/login (Add to beginning of file):

{{{
auth required pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed
}}}

/etc/pam.d/ssh (Add just before {{{@include common-auth}}} line):

{{{
# sshd does not consult the "auth" section of pam when
# GssapiAuthentication=yes, even if UsePAM=yes. Therefore, we add the
# check to the "account" section as well.
account requisite pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed
auth requisite pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed
}}}

If the machine is intended for user logins, DO NOT create /etc/login.restrict. If the machine is only
intended for admin logins, then create the file /etc/login.restrict with the following contents:

{{{
adamc_admin
docelic_admin
megacz_admin
mwolson_admin
ntk_admin
}}}

= Configure SSH =

== Configure SSH Client ==

Insert these lines in {{{/etc/ssh/ssh_config}}} so that ''outbound'' ssh connections will always try to use Kerberos if available:

{{{
  Host *
    GSSAPIAuthentication yes
    GSSAPIDelegateCredentials no
}}}

== Configure SSH Server ==
Line 124: Line 464:
Add this principal to the KDC like this (execute these commands on the new server, as root, while holding admin tokens):

{{{
   kadmin -r HCOOP.NET
     ank -randkey host/server.hcoop.net@HCOOP.NET
     ktadd -k /etc/krb5.keytab
     quit
Add this principal to the KDC like this (execute these commands on the new server, as root, while holding admin tickets):

{{{
   REALM=HCOOP.NET
   ADMIN=myself_admin # your admin username
   SERVER=server.hcoop.net
   rm -f /etc/krb5.keytab # important -- if it already exists the new key will merely be appended
   kadmin -p $ADMIN@$REALM -r $REALM -q "ank -randkey host/$SERVER@$REALM"
   kadmin -p $ADMIN@$REALM -r $REALM -q "ktadd -k /etc/krb5.keytab host/$SERVER@$REALM"
Line 135: Line 477:
Then these lines to {{{/etc/ssh/sshd_config}}}: Then add these lines to the bottom of {{{/etc/ssh/sshd_config}}}:
Line 140: Line 482:
  GSSAPICleanupCredentials no
  UsePAM yes
}}}

=== Optional Steps ===

==== runit ====
  GSSAPICleanupCredentials yes
}}}

Finally, restart the ssh server:

{{{
  /etc/init.d/ssh restart
}}}

= Populate sudoers =

Don't forget to give all of the admins lines in {{{/etc/sudoers}}}. Each line should look like:

{{{
  user_admin ALL=(ALL) NOPASSWD: ALL
}}}

= Set Up Some Cron Scripts =

/etc/cron.daily/hcoop-clean-tmp:

{{{
#!/bin/sh
#
# Clean /tmp periodically.
#
# Edit $TMPTIME in /etc/default/rcS to change the maximal age of /tmp entries
# before they are removed.

exec /afs/hcoop.net/common/etc/scripts/hcoop-clean-tmp
}}}

= Optional Steps =

== Install commonly-used packages ==

{{{
apt-get install \
  xbase-clients # provides xauth, without which "ssh -Y" will not work
  dpkg-dev-el # provide debian-changelog-mode
}}}

== Performance-Tune the OpenAFS Client ==

FIXME: AdamM needs to fill this in

== runit ==
Line 152: Line 534:
  4. Runit captures the daemon's {{{stdout}}} and either sends it to a logger (if specified) or else displays it in the process name (output of {{{ps}}})
Line 164: Line 547:
==== dnscache ====

You can install the dnscache package to make the server self-sufficient for dns resolution purposes (it acts as a tiny dns server just for localhost). This improves the reliability of the overall infrastructure.  There is a copy of this package in {{{/afs/megacz.com/debian/dnscache/}}}; the author of the software recently changed its license, so it will be a standard package in the next release of debian (it may even be in etch-backports already; when it is, this paragraph should be updated to recommend that instead).
== dnscache ==

You can install the dnscache package to make the server self-sufficient for dns resolution purposes (it acts as a tiny dns server just for localhost). This improves the reliability of the overall infrastructure.
Line 169: Line 552:

Here are the instructions for configuring it. Make sure that bind9 (if running) is only listening to {{{127.0.0.1}}} and the public IP address of the machine. We tell dnscache to listen on {{{127.0.0.2}}} so as to avoid conflicts with bind.

{{{
  apt-get install djbdns

  # If needed:
  addgroup --system Gdnscache
  adduser --system Gdnscache --ingroup Gdnscache

  # Create /etc/service/dnscache
  dnscache-conf Gdnscache Gdnscache /etc/service/dnscache 127.0.0.2

  # Change default listen address 127.0.0.1 to .2
  perl -pi -e 's/\.1/.2/' /etc/service/dnscache/env/IP

  # Let dnscache answer queries only from 127.0.0.2
  mv /var/dnscache/root/ip/127.0.0.1 /var/dnscache/root/ip/127.0.0.2

  sv restart dnscache
}}}

Then modify {{{/etc/resolv.conf}}}, replacing the {{{nameserver}}} lines with:

{{{
nameserver 127.0.0.2
}}}

== /etc/hosts ==

If not present already:

{{{
echo '127.0.0.1 localhost' > /etc/hosts
}}}

== ssmtp ==

Life is simpler when you run {{{ssmtp}}}. You can direct the mail stream either to {{{deleuze}}} (preferred) or to a copy of {{{exim}}} running locally (but why bother running it?).

Be sure to enable {{{FromLineOverride}}}, which ships defaulted to "off" in Debian.

{{{
apt-get install ssmtp
sed -i 's_FromLineOverride.*_FromLineOverride=YES_' /etc/ssmtp/ssmtp.conf
}}}

== noatime ==

By default, Linux will write to the disk in order to update the atime ("access time") every time a file is ''read from''; this substantially degrades performance. You can disable this behavior by editing {{{/etc/fstab}}}

{{{
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/hda1 / ext3 defaults,noatime,errors=remount-ro 0 1
}}}

This is especially important on filesystems which are used to store AFS volumes.

== locales ==

If you installed debian via {{{debootstrap}}}, you will be missing the {{{locales}}} package and your locale will not be set. You can fix this with:

{{{
debconf-set-selections <<\EOF
locales locales/default_environment_locale select en_US
locales locales/default_environment_locale seen true
locales locales/locales_to_be_generated multiselect en_US ISO-8859-1
locales locales/locales_to_be_generated multiselect seen true
EOF
apt-get install locales
}}}

== etckeeper ==

{{{
apt-get install etckeeper
cd /etc
etckeeper init
etckeeper commit "Initial checkin"
git gc
}}}

== nitpicks ==

 1. Debian's installer seems to want to put an entry for the machine's own hostname in /etc/hosts, resolving to 127.0.0.1. You'll probably want to remove it.

These steps are listed in approximately the order in which they should be performed; please try to maintain that as you add to it.

List the Machine on the Wiki

The hostname of the machine should be decided through a Members Poll (accessible from members portal) such as https://members.hcoop.net/portal/poll?id=31.

Add the machine to the Hardware page.

It is a very good idea to photograph the front and back panels of the machine and put those images on the wiki page; that way remote admins and people in the data center can be sure they're talking about the same ports.

Add the machine to the IpAddresses page.

Set Up Out Of Band Access

All machines owned by hcoop should, if possible, have some out-of-band mechanism for:

  1. Keyboard access
  2. Screen access
  3. Power-cycling

Functions 1+2 are typically provided by kvm.hcoop.net (see KvmAccess); assuming you plan on going with that, you should connect the server's keyboard and video to the kvm switch.

Each server has its own solution for 3, usually in the form of a "service processor". You should investigate and document the appropriate service processor settings. If the service processor requires its own IP address, you should name it foo-sp.hcoop.net where foo.hcoop.net is the name of the server.

If there's _anything_ server-specific, please add an entry under "Specific Machines" on page AdminArea and document what it is. Rebooting procedures are an ideal candidate for this.

Add a DNS entry for the server

This is done as follows:

  1. Have our DomTool2 CVS repository from SourceForge checked out (for this you need a SourceForge account of course, and write access to the "hcoop" project on SourceForge)

  2. Edit domtool2/lib/hcoop.dtl and add definition for "HOSTNAME_ip" (search for "deleuze_ip" and just copy the line to new name)
  3. TODO: How to recompile and install new domtool with HOSTNAME_ip defined
  4. Edit /afs/hcoop.net/user/h/hc/hcoop/.domtool/hcoop.net to add the new DNS entry, using HOSTNAME_ip (again, can use deleuze_ip as example)

  5. To apply DomTool configuration, run DOMTOOL_USER=hcoop domtool hcoop.net in the ~hcoop/.domtool/ directory

Install Debian

We use Debian GNU.

Here are the installation notes to help you:

  1. Find Debian stable image (whichever is 'stable' at time of installation)
  2. Prepare a USB stick to boot from (can do it manually or with convenient tool called "unetbootin")
  3. In system BIOS, choose 'Auto-power on on power restore' (if there is such option), and see if you can make USB stick to not be the first disk (when it's the first disk, it gets assigned device name /dev/sda and makes the installation a tiny bit harder)
  4. See which network card is in the server, if it requires non-free firmware, the package needs to be manually copied from Debian's non-free repository onto the install media (example is package "firmware-bnx2" for Broadcom NetXtremeII cards (http://packages.debian.org/sid/all/firmware-bnx2/download)). Once package is on the media, the install procedure will, if it is needed, automatically find and install it

  5. For timezone, use timezone where the server is physically located, and answer Yes to "Is the hardware clock set to GMT?"
  6. Choose manual network configuration, specifying the choosen hostname, IP and network details as listed on the IpAddresses page

  7. Partition disks. Most often, this comes town to creating identical partitions on all disks that are part of RAID1, and creating RAID arrays as inthe following example (can probably reuse it verbatim in your scenario):

    Example: 2x 160 GB system disks
    
    System disk 1:
    
    sda1: primary, beginning, 1 GB, ext3, /boot
    sda2: primary, beginning, 8 GB, use as phys. volume for RAID (swap space: 1 GB x number of proc. cores)
    sda3: primary, beginning, all available space, use as phys. volume for RAID
    
    System disk 2:
    
    sdb1: primary, beginning, 1 GB, ext3, unmounted
    sdb2: primary, beginning, 8 GB, use as phys. volume for RAID (swap, same size as above)
    sdb3: primary, beginning, all available space, use as phys. volume for RAID
    
    Then, after RAID partitions have been assigned, new option "Configure RAID"
    will appear at the top of the partitioning menu. We add the two devices in
    RAID 1 mode:
    
    md0: sda2 and sdb2
    md1: sda3 and sdb3
    
    Then, they appear in the partitions list and are configured as follows:
    
    md0: swap
    md1: ext3, /
  8. As seen, /boot partition is not on RAID. This is intentional as /boot on RAID is problematic. But sda1 (/boot) will be synced onto sdb1 (by cping the files and running grub-install on it) periodically via cron, so that sdb1 can be used for booting too, in case sda fails. (TODO: Cron job/Puppet recipe to copy that does this)
  9. Users & password setup: set root password and choose "No" at "Create regular user account?" prompt. If the installation does not let you continue without creating a regular user, create "root2" with the same password as root. The password should not be an official password, but a strong temporary string. (TODO: Puppet recipe that manages passwords and admin's SSH keys).

  10. If /dev/sda is the USB stick and not the first disk, do not install GRUB to the Master Boot Record of /dev/sda. Instead, answer No at the prompt and choose /dev/sdb as the device. Then, take USB stick out, edit /boot/grub/menu.lst to replace references to hd(1,0) with hd(0,0), run update-grub.conf and grub-install /dev/sdb. No other tunings (to /etc/fstab or mdadm.conf) are needed as, if you used the partitioning example, no direct partitions occur in fstab, and for mdadm -- it uses UUIDs instead of partition names anyway.

  11. In tasksel, at the end of installation, do not select any package category, not even "Standard system"

Booting into the new machine

When the machine boots for the first time, run:

dpkg-reconfigure debconf    # (choose interface: Dialog, priority: Low).

apt-get install less sudo vim etckeeper changetrack lm-sensors openssh-server debsums logcheck bzip2 denyhosts

Verify that disks performance is as expected using sync; sync; hdparm -tT /dev/sdX.

Activate etckeeper as documented on EtcKeeper.

Edit /etc/default/changetrack and set AUTO_TRACK_ALL_CONFFILES=yes.

Edit /etc/tripwire/twcfg.txt and set MAILNOVIOLATIONS =false. Initialize the database with tripwire --init. (If tripwire is installed)

Edit /etc/aliases and set "root" alias to "logs@hcoop.net", and possibly other addresses, separated by commas. (logs@ is an aliasMulti, defined in ~hcoop/.domtool/hcoop.net and lists people who want to receive verbose system logs).

Run sensors-detect to see if the kernel has appropriate thermal modules for the server, and add any drivers detected to /etc/modules.

For all ext partitions, run tune2fs -j -c0 -i0 /dev/sdXX (and /dev/mdX for RAID arrays).

Tune the /etc/apt/sources.list

cat > /etc/apt/sources.list <<\EOF
deb http://mirror.peer1.net/debian/ lenny main
deb-src http://mirror.peer1.net/debian/ lenny main

deb http://security.debian.org/ lenny/updates main
deb-src http://security.debian.org/ lenny/updates main

deb http://volatile.debian.org/debian-volatile lenny/volatile main
deb-src http://volatile.debian.org/debian-volatile lenny/volatile main
EOF

apt-get update
apt-get dist-upgrade

Remove lame directories

sudo rm /cdrom
sudo rm /media/cdrom
sudo rm /media/floppy
sudo rmdir /media/cdrom[0-9]
sudo rmdir /media/floppy[0-9]
sudo rmdir /media 
sudo rmdir /opt 

Compile a Kernel

Here's an example for kernel 2.6.31.9, adjust accordingly:

apt-get install make gcc patch bin86 kernel-package libncurses5-dev fakeroot

cd /usr/local/src
wget http://grsecurity.net/stable/grsecurity-2.1.14-2.6.31.9-200912191011.patch
wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.31.9.tar.bz2

tar jxf linux-2.6.31.9.tar.bz2
patch -p0 < grsecurity-2.1.14-2.6.31.9-200912191011.patch

cd linux-2.6.31.9
cp /some/existing/config .config
make oldconfig
make menuconfig   # (if any manual tuning needed)

time CONCURRENCY_LEVEL=8 fakeroot make-kpkg --initrd kernel_image >& ../build.log

Install the AFS Client

The AFS client gets very unhappy if the partition holding /var/cache/openafs fills up. To ensure that this can't happen, we'll create a 2GB file and mount it there using the loopback device. This gives the openafs client a partition-in-a-file all to itself that no other process can interfere with.

First, create the file:

dd if=/dev/zero of=/var/cache/openafs.ext3 bs=1M count=2K
chmod go-rwx /var/cache/openafs.ext3
mke2fs -F /var/cache/openafs.ext3
tune2fs -j -i0 -c0 /var/cache/openafs.ext3

Then mount it. Note: we could mount it directly on /var/cache/openafs, but if we did that and for some reason it failed to mount, the openafs client would just write files into that directory anyways. We want to know immediately if the mount fails, so we'll make /var/cache/openafs a symlink to a subdirectory of the new partition.

mkdir /var/cache/openafs.mnt
echo -e '/var/cache/openafs.ext3\t/var/cache/openafs.mnt\text3\tloop\t1\t1' >> /etc/fstab
mount /var/cache/openafs.mnt/
mkdir -p /var/cache/openafs.mnt/cache/
rm -rf /var/cache/openafs
ln -s /var/cache/openafs.mnt/cache /var/cache/openafs

Then, give our preferences to debconf:

debconf-set-selections <<\EOF
openafs-client openafs-client/thiscell string hcoop.net
openafs-client openafs-client/thiscell seen true
openafs-client openafs-client/dynroot boolean true
openafs-client openafs-client/dynroot seen true
openafs-client openafs-client/cachesize string 500000
openafs-client openafs-client/cachesize seen true
openafs-client openafs-client/cell-info string
openafs-client openafs-client/cell-info seen true
openafs-client openafs-client/run-client boolean true
openafs-client openafs-client/run-client seen true
EOF

You should install the module-assistant, build-essential, module-init-tools, openafs-client, openafs-krb5, openafs-modules-source, openafs-doc, libopenafs-dev, and kstart packages. Here is a block of commands to cut and paste if you are lazy:

apt-get install krb5-user libkrb5-dev module-init-tools kstart sudo \
        module-assistant build-essential  bison flex debhelper
mkdir -p /tmp/openafs-packages
cd /tmp/openafs-packages
scp ssh.hcoop.net:/afs/hcoop.net/common/debian/openafs/1.4.6/\*.deb      ./
dpkg -i \
    openafs-client*.deb         \
    openafs-krb5*.deb           \
    openafs-modules-source*.deb \
    openafs-doc*.deb            \
    libopenafs-dev*.deb         
cd /tmp
rm -rf /tmp/openafs-packages

Once these packages are installed, you will want to run

  module-assistant a-i -t openafs-modules

... assuming you compiled your own kernel and the compiled kernel tree resides in /usr/src/linux. If this is not the case, you are on your own.

If the command above completes, it will have created and installed a .deb containing the kernel module. You may need to run

depmod
/etc/init.d/module-init-tools start

to refresh whatever module wonkery linux maintains in obscure locations. Once this is figured out (if all else fails, reboot) you should be able to

  /etc/init.d/openafs-client start

Do this and check that /afs shows up.

Install Packages

Now that afs is up, you can easily install packages. The block of commands below installs the set of packages which must be on every hcoop server (this list will be expanded as necessary).

dpkg -i /afs/hcoop.net/user/m/me/megacz/public/libnss-afs/libnss-afs*.deb
dpkg -i /afs/hcoop.net/common/debian/libpam-afs-session/*.deb
dpkg -i /afs/hcoop.net/common/debian/libpam-krb5/*.deb
dpkg -i /afs/megacz.com/debian/fsr*.deb
dpkg -i /afs/megacz.com/debian/krb5-user/{krb5-user,libk}*.deb

The first three packages are explained below; the fourth one is the fsr command (recursive "fs"). The last line installs a fixed version of kadmin which understands DNS entries.

Install Network Time Protocol Daemon

Kerberos and AFS will not work correctly unless the clocks of the client and server are synchronized to within a certain tolerance. Therefore, it is important for us to have a daemon running that keeps the clock set properly. This step is not optional.

  apt-get install ntp

Install LDAP Support

Logins etc. will not work correctly unless libpam-ldap is installed and configured:

  apt-get install libpam-ldap

Debconf answers:

debconf-set-selections <<\EOF
libpam-ldap     shared/ldapns/base-dn   string  dc=hcoop,dc=net
libpam-ldap     shared/ldapns/ldap-server       string  ldap://69.90.123.67/
libpam-ldap     libpam-ldap/pam_password        select  exop
libpam-ldap     libpam-ldap/rootbinddn  string  cn=admin,dc=hcoop,dc=net
libpam-ldap     libpam-ldap/dbrootlogin boolean true
libpam-ldap     libpam-ldap/override    boolean true
libpam-ldap     shared/ldapns/ldap_version      select  3
libpam-ldap     libpam-ldap/dblogin     boolean false
EOF

You will also need to know LDAP admin password; see /etc/pam_ldap.secret on one of existing servers and re-type the password into the password prompt.

Configure Kerberos

VERY IMPORTANT: put exactly the following in /etc/krb5.conf -- no more, no less

[libdefaults]
        default_realm = HCOOP.NET
        kdc_timesync = 1
        forwardable = true
        proxiable = true
        rdns = no          # undocumented option to disable reverse DNS lookups
[logging]
        default = FILE:/proc/self/fd/2

We distribute our Kerberos configuration via DNS, so it is very important that we do not "hardwire" the settings on any of the servers (except the KDCs themselves). If we did, we wouldn't notice at first, but strange problems would crop up as soon as the DNS settings were changed. So, it is important that we put only the bare minimum amount of information in krb5.conf.

Configure Name Service

A "name service" is Linux's mechanism for answering these queries:

  1. the userid for a given username and vice versa
  2. the groupid for a given groupname and vice versa
  3. the home directory for a user
  4. the shell for a user
  5. what groups a user is in

The libnss-afs package lets linux use the AFS user database (the ptserver or protection server) as a name service and makes PAGs show up as a special group. To enable these changes, edit /etc/nsswitch.conf and change the passwd and group lines to look like this:

passwd:  afs files
group:   afs files
shadow:  files

Install Name Service Caching Daemon

It is highly recommended to install nscd in order to get good performance out of libnss-afs.

  apt-get install nscd

Unfortunately there is a grevious bug in the DNS caching mechanism in etch's nscd (see this); so we must disable it until it is fixed. To do this, edit /etc/nscd.conf and change the line

sed -i 's_enable-cache.*hosts.*yes_enable-cache hosts no_' /etc/nscd.conf

We prefer to run nscd as a runit service so that it does not go down (except on deleuze, where it must be started strictly after AFS in the boot sequence).

  apt-get install runit
  mkdir /var/service/nscd
  cat <<EOF > /var/service/nscd/run
#!/bin/sh                                                                       
exec nscd -d
EOF
  mkdir /var/service/nscd/log
  cat <<EOF > /var/service/nscd/log/run
#!/bin/bash                                                                     
svlogd -tt /var/log/nscd/
EOF
  mkdir /var/log/nscd
  chmod +x /var/service/nscd/log/run
  chmod +x /var/service/nscd/run

  dpkg-divert --rename /etc/init.d/nscd
  ln -s /usr/bin/sv /etc/init.d/nscd

Configure PAM

PAM is Linux's mechanism to do the following:

  1. decide if somebody is who they say they are (authentication; in our case via kerberos)
  2. set up sessions (in the case of AFS, this means creating PAGs)

  3. change passwords (in our case, changing the password in the KDC)

Here's the usual PAM setup:

/etc/pam.d/common-account:

account sufficient      pam_unix.so
account required        pam_ldap.so
account required        pam_krb5.so debug

# temporary line for emergencies
#account required       pam_unix.so

account required pam_access.so

/etc/pam.d/common-auth:

auth    sufficient        pam_krb5.so debug forwardable ignore_root
auth    optional          pam_afs_session.so program=/usr/bin/aklog debug
auth    required          pam_unix.so nullok_secure try_first_pass

# temporary line for emergencies
#auth   required          pam_unix.so nullok_secure

auth    required          pam_env.so

/etc/pam.d/common-password:

password sufficient pam_krb5.so 
password required   pam_unix.so nullok obscure min=4 max=8 md5 shadow try_first_pass

/etc/pam.d/common-session:

session requisite pam_limits.so
session required  pam_unix_session.so      # Unix module just logs access
session optional  pam_krb5.so
session optional  pam_afs_session.so program=/usr/bin/aklog debug

/etc/pam.d/login (Add to beginning of file):

auth       required pam_listfile.so item=user sense=allow file=/etc/login.restrict  onerr=succeed

/etc/pam.d/ssh (Add just before @include common-auth line):

# sshd does not consult the "auth" section of pam when
# GssapiAuthentication=yes, even if UsePAM=yes.  Therefore, we add the
# check to the "account" section as well.
account    requisite    pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed
auth       requisite    pam_listfile.so item=user sense=allow file=/etc/login.restrict onerr=succeed

If the machine is intended for user logins, DO NOT create /etc/login.restrict. If the machine is only intended for admin logins, then create the file /etc/login.restrict with the following contents:

adamc_admin
docelic_admin
megacz_admin
mwolson_admin
ntk_admin

Configure SSH

Configure SSH Client

Insert these lines in /etc/ssh/ssh_config so that outbound ssh connections will always try to use Kerberos if available:

  Host *
    GSSAPIAuthentication yes
    GSSAPIDelegateCredentials no

Configure SSH Server

You will need to create a "host principal" for the new server; if you are setting up server.hcoop.net, then it must have the name

   host/server.hcoop.net@HCOOP.NET

Add this principal to the KDC like this (execute these commands on the new server, as root, while holding admin tickets):

   REALM=HCOOP.NET
   ADMIN=myself_admin       # your admin username
   SERVER=server.hcoop.net
   rm -f /etc/krb5.keytab   # important -- if it already exists the new key will merely be appended
   kadmin -p $ADMIN@$REALM -r $REALM -q "ank -randkey host/$SERVER@$REALM"
   kadmin -p $ADMIN@$REALM -r $REALM -q "ktadd -k /etc/krb5.keytab host/$SERVER@$REALM"
   chown root:root /etc/krb5.keytab
   chmod go-rwx /etc/krb5.keytab

Then add these lines to the bottom of /etc/ssh/sshd_config:

  GssapiKeyExchange yes
  GssapiAuthentication yes
  GSSAPICleanupCredentials yes

Finally, restart the ssh server:

  /etc/init.d/ssh restart

Populate sudoers

Don't forget to give all of the admins lines in /etc/sudoers. Each line should look like:

  user_admin  ALL=(ALL) NOPASSWD: ALL

Set Up Some Cron Scripts

/etc/cron.daily/hcoop-clean-tmp:

#
# Clean /tmp periodically.
#
# Edit $TMPTIME in /etc/default/rcS to change the maximal age of /tmp entries
# before they are removed.

exec /afs/hcoop.net/common/etc/scripts/hcoop-clean-tmp

Optional Steps

Install commonly-used packages

apt-get install \
  xbase-clients       # provides xauth, without which "ssh -Y" will not work
  dpkg-dev-el         # provide debian-changelog-mode

Performance-Tune the OpenAFS Client

FIXME: AdamM needs to fill this in

runit

The runit package is a mechanism for starting, stopping, and monitoring daemons. It is an alternative to the traditional /etc/init.d and start-stop-daemon scheme. Its chief advantages are:

  1. It launches daemons with clean process state; the daemon inherits nothing from the administrator invoking the start/stop command because the daemon is not forked as a child of the administrator's shell (rather, a request is sent runit daemon asking it to fork the daemon). This is very important when dealing with tokens and pags.

  2. Runit monitors the processes that it forks, and restarts them if they die.
  3. Runit eliminates the need for pidfiles and the associated risk of starting multiple copies of a daemon.
  4. Runit captures the daemon's stdout and either sends it to a logger (if specified) or else displays it in the process name (output of ps)

   apt-get install runit

When you move a process from /etc/init.d/ control to runit supervision, you should inform debian that you have done so:

  # assuming /var/service/$SERVICE/run is the runit script
  dpkg-divert --rename /etc/init.d/$SERVICE
  ln -s /usr/bin/sv /etc/init.d/$SERVICE

This will cause invocations of /etc/init.d/script {start|stop}  to do "the right thing".

dnscache

You can install the dnscache package to make the server self-sufficient for dns resolution purposes (it acts as a tiny dns server just for localhost). This improves the reliability of the overall infrastructure.

Starting dnscache via runit is often a good idea; this ensures that it starts early in the boot process and that it is restarted if it dies for any reason.

Here are the instructions for configuring it. Make sure that bind9 (if running) is only listening to 127.0.0.1 and the public IP address of the machine. We tell dnscache to listen on 127.0.0.2 so as to avoid conflicts with bind.

  apt-get install djbdns

  # If needed:
  addgroup --system Gdnscache
  adduser --system Gdnscache --ingroup Gdnscache

  # Create /etc/service/dnscache
  dnscache-conf Gdnscache Gdnscache /etc/service/dnscache 127.0.0.2

  # Change default listen address 127.0.0.1 to .2
  perl -pi -e 's/\.1/.2/' /etc/service/dnscache/env/IP

  # Let dnscache answer queries only from 127.0.0.2
  mv /var/dnscache/root/ip/127.0.0.1 /var/dnscache/root/ip/127.0.0.2

  sv restart dnscache

Then modify /etc/resolv.conf, replacing the nameserver lines with:

nameserver 127.0.0.2

/etc/hosts

If not present already:

echo '127.0.0.1 localhost' > /etc/hosts

ssmtp

Life is simpler when you run ssmtp. You can direct the mail stream either to deleuze (preferred) or to a copy of exim running locally (but why bother running it?).

Be sure to enable FromLineOverride, which ships defaulted to "off" in Debian.

apt-get install ssmtp
sed -i 's_FromLineOverride.*_FromLineOverride=YES_' /etc/ssmtp/ssmtp.conf

noatime

By default, Linux will write to the disk in order to update the atime ("access time") every time a file is read from; this substantially degrades performance. You can disable this behavior by editing /etc/fstab

# <file system> <mount point>   <type>  <options>                          <dump>  <pass>
/dev/hda1       /               ext3    defaults,noatime,errors=remount-ro 0       1

This is especially important on filesystems which are used to store AFS volumes.

locales

If you installed debian via debootstrap, you will be missing the locales package and your locale will not be set. You can fix this with:

debconf-set-selections <<\EOF
locales locales/default_environment_locale      select  en_US
locales locales/default_environment_locale      seen    true
locales locales/locales_to_be_generated multiselect     en_US ISO-8859-1
locales locales/locales_to_be_generated multiselect     seen true
EOF
apt-get install locales

etckeeper

apt-get install etckeeper
cd /etc
etckeeper init
etckeeper commit "Initial checkin"
git gc

nitpicks

  1. Debian's installer seems to want to put an entry for the machine's own hostname in /etc/hosts, resolving to 127.0.0.1. You'll probably want to remove it.

SetupNewMachines (last edited 2012-12-20 21:13:00 by ClintonEbadi)