welcome: please sign in

Revision 11 as of 2020-03-08 05:11:57

Clear message
Edit

ServerBusted

Server busted.hcoop.net is a virtual machine at DigitalOcean that was created to work on the Debian Stretch to Buster upgrade.

It's name is just an allusion to it being broken by design.

1. Setup Notes

1.1. Need to upgrade system first

Newer kernel and some other base packages are available right out of the bat, need to upgrade so we can have working kernel headers for the afs build and whatnot. Added to general setup notes.

1.2. resolv.conf / initial puppet cert request

We can't really get around manually opening the firewall for the agent on the puppetmaster... at our scale this isn't a big deal anyway.

Like others, had to set domain hcoop.net manually in /etc/resolv.conf. It looks like the only reason we need this is for the initial puppet connection. So I tried setting the agent config at /etc/puppetlabs/puppet/puppet.conf to:

[main]
server = puppet.hcoop.net

But the cert for the master only has the fqdn of its concrete hostname, and the alias puppet with no domain

Error: Server hostname 'puppet.hcoop.net' did not match server certificate; expected one of gibran.hcoop.net, DNS:puppet, DNS:gibran.hcoop.net
Error: Could not run: Server hostname 'puppet.hcoop.net' did not match server certificate; expected one of gibran.hcoop.net, DNS:puppet, DNS:gibran.hcoop.net

If we could regenerate this to also include CN:puppet.hcoop.net, the manual edit that needed to be done would at least be more related to the limitation in our infrastructure that mandates it...

1.3. /usr/bin/mail behaves differently

GNU mailutils now provides /usr/bin/mail instead of bsd-mailx. It treats addresses a bit differently, appending the hostname. So mail -s "foo" root goes to root@busted.hcoop.net instead of just root which is then rewritten to logs@hcoop.net. The message then gets stuck in exim forever until it gets frozen and purged.

Not sure we want to switch back to bsd-mailx over this though, for now keep mailutils as the default provider.

2. Puppet porting notes

2.1. ntp tcp rule failure

We were setting our ntp out rule using tcp and udp, but /etc/services only had the udp alias now (which is correct). Pushed out a fix, but for some reason runs still failed with the same error afterward. Hacked around it by adding the ntp/tcp alias to /etc/services. Need to look into further (I think this might have been the manually added firewall on the puppetmaster expiring and the cached catalog being used).

Error: Failed to apply catalog: Parameter dport failed on Firewall[010 ntp output protocol tcp using provider iptables]: Munging failed for value "ntp" in class dport: no such service ntp/tcp (file: /etc/puppetlabs/code/environments/production/modules/firewall_multi/manifests/init.pp, line: 126)

2.2. afs client not immediately available

Loading new openafs-1.8.2 DKMS files...
Building for 4.19.0-6-cloud-amd64
Module build for kernel 4.19.0-6-cloud-amd64 was skipped since the
kernel headers for this kernel does not seem to be installed.

Issue seems to be that digitalocean is now using a the amd64-cloud variant of the kernel, so we're pulling in the wrong headers. Need to check into this more, it looks like a standard part of Debian.

Should be mitigated: added a $kernel_packages argument to the openafs client service class, and used Hiera to set the installed headers correctly depending on whether we're on debian 9 or 10.

2.3. libnss-afs installs to non-multiarch location

minor issue, but might want to address. We're still installing to just /usr/lib instead of /usr/lib/x86_64-linux-gnu/ (need to update package to comply with multiarch)

2.4. HCoop Debian Package Repo

We don't trigger an apt update after we install our key and repo, so package installs fail until we manually apt-get update

3. Puppet WONTFIX

3.1. ssmtp is gone

We need to switch to msmtp

Switching to msmtp proved to be difficult, lowuid rewriting to send mails to logs@ alias is not working, and can't work as far as I can tell. I ended up just backporting ssmtp since it's not removed from Debian, but just didn't get moved into buster (it's also just a bit unmaintained). It might be easier to just set up exim in satellite mode going forward instead.

4. TODO


CategorySystemAdministration