welcome: please sign in

The following 929 words could not be found in the dictionary of 7 words (including 7 LocalSpellingWords) and are highlighted below:
ability   able   about   abstraction   Abulafia   access   accesses   accidental   account   accounts   accurately   action   actions   Adam   add   added   additions   address   addresses   Admin   Administration   administrative   admins   adminstrative   against   agree   agreed   all   All   allocate   allocated   allocates   allocation   allow   allowed   allowing   allows   along   already   Also   also   Alternatively   am   amount   an   and   another   Another   any   Any   anything   Apache   appropriately   arbitrary   are   Are   area   aren   arrange   as   As   aspects   Assassin   at   attention   auth   authentication   automate   automated   automatic   automatically   available   avoiding   back   backup   bad   bandwidth   base   based   basic   basically   Bayes   be   because   before   behalf   behavior   behaviors   believe   below   best   bet   better   beyond   Big   biz   bombs   both   box   Breaks   bring   broken   but   button   by   call   called   calls   can   Can   canonical   cap   catastrophic   categories   Category   cause   causing   Centralized   certainly   cfengine   changes   charge   charged   checked   child   Chlipala   choose   chose   classify   client   Clinton   closely   co   code   com   come   comes   coming   committing   common   complete   completely   conception   configuration   Configuration   configurations   configure   configuring   connect   connection   consequences   console   consolidated   constantly   Contents   continue   contributions   control   conventional   Coop   coordinating   copying   corresponding   could   Courier   crashing   crazy   create   creation   critical   crucial   current   Current   currently   custom   cutting   Cyrus   Daemon   daemon   daemons   Daemons   data   database   Databases   databases   date   Davor   dead   deal   dealing   decide   decisions   Decisions   default   defined   defining   definitely   deliberately   dependence   depending   deploy   depositing   detail   detection   development   did   Different   different   differently   directive   directories   discussion   disk   Disk   djbdns   Dns   Do   do   documented   doesn   doing   Doing   Domain   domain   domains   Domtool   domtool   don   dot   down   drop   duration   dynamic   Dynamic   dyndns   each   easier   easy   Ebadi   edge   effective   either   Ejabberd   eliminate   else   elsewhere   email   Email   Empie   enable   enabled   encountered   encourages   encrypted   engineering   Enough   enough   entirely   environment   etc   even   every   Everyone   everyone   everything   Everything   evil   example   Examples   excellent   exceptions   Exim   expand   experiences   explicit   factor   failed   failures   fairly   family   farm   favor   favorite   Feeding   few   fighting   File   file   files   filesystem   filtering   finally   Firewall   firewall   first   First   fix   fixed   flexibility   folders   folks   follow   following   For   for   forced   Fork   found   frequent   friends   from   From   Ftp   fun   future   fwd   general   generation   geographic   gimpy   give   gizmo   go   goes   going   good   got   great   group   grow   guess   hackish   had   hand   handle   hard   hardware   has   have   haven   hcoop   headers   Heart   heavy   help   here   Here   himself   Historical   home   horrors   host   Host   hosting   hosts   hot   how   How   However   html   http   https   hundreds   idea   ident   If   if   ignored   ilk   implement   implementation   implementations   important   impose   imposed   In   in   include   including   increasing   Increasing   individual   infinite   information   infrastructure   input   install   instance   instructions   interaction   interested   interface   internet   intervene   intervention   into   involved   is   issues   it   It   its   Jabber   jabber   Just   just   justify   Justin   keep   Keep   keeping   Keeping   Kerberos   kind   know   language   languages   large   last   later   latest   lay   learn   least   leave   leaving   left   legions   legitimate   Leitgeb   let   letting   levels   life   like   likely   limit   limitations   limiting   Limits   limits   list   List   listen   lists   literal   ll   load   local   location   locations   log   logging   loghost   login   logins   logistics   Logsurfer   long   looping   lot   machine   machines   made   magic   Mail   mail   mailboxes   Mailing   mailing   Mailman   main   Main   mainstream   make   makes   malicious   manage   mandates   manual   manually   many   maps   master   matter   may   me   mean   meaning   means   meant   measures   mechanism   member   members   Members   memory   mental   messages   method   methods   Michael   mind   misclassified   missing   mod   model   models   moiin   monetarily   monitor   monitoring   more   most   mostly   much   Multiple   multiple   My   my   Nagios   name   names   necessary   need   needed   negative   net   network   Network   networks   never   new   next   nice   nicer   no   non   Nonetheless   nor   normal   not   novices   now   nproc   number   Number   occur   Ocelic   Of   of   Off   off   official   often   Olson   On   on   one   ones   Only   only   op   opted   optimal   options   or   org   organizational   organize   other   otherwise   our   out   outage   over   own   owned   ownership   package   packages   page   parser   partially   particularly   partition   passwords   pasting   patched   patches   pay   Peer   pending   people   per   perl   Perl   permissions   person   perspective   pieces   ping   place   planning   play   Please   poorly   Portal   portal   ports   possible   Postgre   potential   practice   Preferences   prevent   primary   private   privilege   privileges   probably   problems   process   processes   production   program   programming   programs   protected   provide   providers   provides   proxies   proxy   public   Puppet   pushed   put   question   Questions   quickly   quota   quotas   ran   re   real   really   reason   reasons   recipients   records   redirection   Redirection   redundant   reference   References   Registration   related   Relational   relational   release   reliable   reloading   relying   remedies   remote   remove   replacement   requested   require   resolved   resolves   resource   Resource   resources   responsible   rest   rewrite   Right   right   risky   rsync   rule   Rules   Run   run   runaway   running   Running   same   satisfied   save   says   Scalability   scale   Scary   scheme   schemes   Scrap   script   scripts   seamlessly   secondary   seconds   secure   securely   Security   security   see   seems   sense   separate   separation   serial   Serious   Served   served   Server   server   servers   service   Services   services   set   setting   settings   several   shared   Shaun   Shell   shell   Should   should   significantly   similar   since   single   sip   site   sites   Sites   slave   slow   small   so   software   some   someone   something   sorting   source   space   spaces   Spam   spamc   spamd   special   spiffy   spit   spite   split   Squirrel   ssh   Ssh   stable   standard   start   Start   static   still   storage   store   storing   stronger   strongly   structure   stuff   such   suexec   suggested   suggestion   suggests   support   supported   supports   surprising   sweeping   switch   system   System   systems   Table   take   takes   talked   talking   telling   Terminology   tertiary   test   testing   text   textual   than   thanks   that   That   The   the   their   them   then   There   there   These   these   they   thing   Things   things   think   thinking   thinks   Third   this   This   through   throughout   time   times   to   To   together   too   tool   Tool   tools   top   towards   trained   Transfer   tries   trouble   turn   ulimit   ulimits   unable   understand   unintended   Unix   unknownlamer   unsolvable   unstable   up   Update   upgrade   Upgrade   upgrading   us   usage   use   Use   used   user   User   user1   userdomain   usermin   Usermin   users   uses   Using   using   usual   varying   ve   version   versions   very   via   Virtual   virtual   volume   want   was   wasn   way   We   we   Web   web   Webmin   webpasswd   Websites   well   Well   what   What   when   Where   where   which   while   white   Who   who   whose   wide   Wiki   wiki   will   wish   With   with   without   won   working   worries   worry   would   write   writing   written   years   yet   you   your   yourself   zones  

Clear message


This page was meant to organize a discussion and is not the canonical reference on our organizational decisions. It may often be out of date.

1. Terminology

To save space below, we'll use the following working names for the different pieces of hardware involved:

2. The Big List of Scary Things

These are the issues that we're dealing with for the first time in our new set-up, meaning that we should pay special attention to them.

3. The Big Questions

3.1. What Debian version do we run on each server?

AdamChlipala suggests stable on Main and testing on Dynamic and Shell because:

Update: We're currently planning stable on Main and Dynamic, since testing too often has catastrophic upgrade failures in practice.

3.2. What resource limits are imposed on the different servers?

3.2.1. Decisions that we've agreed on

3.2.2. Questions to be resolved

  1. Do we impose ulimits and related stuff on Dynamic?

    AdamChlipala says:

    • We need some measures in place to prevent runaway processes from crashing everyone's dynamic web sites. The question is, do we use automated measures or do we just monitor closely and intervene manually when needed? A bad runaway process can take the server down quickly, so I think it's necessary to use ulimits and their ilk.
  2. How do we control resource usage on Shell?

    AdamChlipala says:

    • I think I'm in favor of no ulimits or similar on Shell, relying on monitoring and manual intervention to deal with runaway processes and other horrors. We've already had some folks unable to use some implementations of non-mainstream programming languages because these implementations aren't able to deal with our resource limits... and, if you know me, you can probably guess that that Just Breaks My Heart!
  3. Where we do decide to use monitoring and manual intervention, what monitoring tools can best help us do it?

    DavorOcelic says:

    • I've talked about this multiple times before, and I'm still interested in doing something real in this area. First of all, there's a log parser I've written, which is very similar to Logsurfer (or Logsurfer+ for that matter), but which resolves some of their crucial limitations; we'd definitely turn Main machine into a common loghost, so this would be a good place to deploy this on. Another good thing I have in mind is Nagios, a ping/service/anything monitoring tool. Third tool I have in mind is the excellent Puppet (kind of cfengine new-generation) that we can script to test and fix stuff on our systems.

3.3. Who can log into which servers?

3.3.1. Decisions that we've agreed on

3.3.2. Questions to be resolved

  1. Can everyone log into Dynamic, too?

    AdamChlipala says:

    • I think it is important to allow this. My mental model has Shell made deliberately unstable because we don't know how to impose automatic limits that allow all of the stuff that people want to do. I know that a lot of the people involved in this planning aren't particularly interested in using non-mainstream programming languages and other things that conventional hosting providers are never going to support, but for me and several other members this is one of the defining aspects of HCoop. That means that we need to be able to go crazy with Shell, while committing to keeping Dynamic up all the time. If Shell is down, members need to be able to use Dynamic to configure their services. That doesn't mean that they can't use the development-production split model when Shell is up, logging in only there.

3.4. How are we going to handle the basic logistics of a shared filesystem and logins?

3.4.1. Decisions that we've agreed on

3.4.2. Questions to be resolved

Everything else!

3.5. How are we going to charge (monetarily or just to have a sense of who is using what) members accurately for their disk usage?

There are a lot of issues here. We provide a number of shared services whose default models create files on the behalf of members but that are (by default) owned by a single UNIX user. Examples include PostgreSQL and MySQL databases, virtual mailboxes, Mailman mailing lists, and domtool configuration files. Any of these can grow so large as to use up all disk space on a volume, through either malicious action or accidental runaway processes.

Right now we use this gimpy scheme of group quotas on /home, storing all of these files on that partition with group ownership telling which member is responsible for them. I think AFS provides a nicer way of doing this. With the way we do it now, we are constantly fighting the behavior of the out-of-the-box Debian packages to set permissions differently than how we need them to be. With AFS, I think we can separate permissions from locations.

4. Daemons shared by members

4.1. Off-site file back-up services

4.1.1. Questions to be resolved

4.2. DNS

4.2.1. Decisions that we've agreed on

Update: Scrap that! We're using BIND on Main and Dynamic, since it's so much better supported throughout the 'net, makes master/slave configurations easier, etc.. In the future, we want to expand to include a tertiary DNS server in a different geographic location and on an entirely different network.

4.2.2. Questions to be resolved

  1. How do we arrange redundant DNS infrastructure?

JustinLeitgeb says:

4.2.3. References to how we do things now

DnsConfiguration, DomainRegistration

4.3. FTP

4.3.1. Decisions that we've agreed on

4.3.2. References to how we do things now

FtpConfiguration, FileTransfer

4.4. HTTP

4.4.1. Decisions that we've agreed on

4.4.2. Questions to be resolved

  1. Do we completely separate adminstrative web sites from the rest, or do we allow any member static web site to be served by Main?

    DavorOcelic says:

    • Well. I think we don't have many administrative web sites (nor the ones we have are used heavy enough) to justify complete separation. It should be OK to run static web sites from Main, I believe. We could create default web spaces for users, like ~/public_html/ served from Dynamic, and ~/static_html/ served from Main, or something like that. (Please give more input on this).
      • I think it would better to have a domtool directive that chose which machine the site was served on (e.g. ServedOn static|dynamic) and then let members choose how to lay out their own directories. -- ClintonEbadi

4.4.3. References to how we do things now

UserWebsites, DynamicWebSites, VirtualHostConfiguration


4.5.1. Decisions that we've agreed on

4.5.2. Questions to be resolved

  1. Do we keep using Courier IMAP or do we switch to something like Cyrus?

4.5.3. References to how we do things now

UsingEmail, EmailConfiguration

4.6. Jabber

4.6.1. Decisions that we've agreed on

4.6.2. Questions to be resolved

4.6.3. References to how we do things now


4.7. Mailing lists

4.7.1. Decisions that we've agreed on

4.7.2. Questions to be resolved

  1. How/where do we store mailing list data so that it is appropriately charged towards a member's storage quota?

4.7.3. References to how we do things now


4.8. Relational database servers

4.8.1. Decisions that we've agreed on

4.8.2. Questions to be resolved

  1. Are we satisfied with the latest versions from Debian stable, or do we want to do something special?
  2. Do remote PostgreSQL authentication (from Dynamic, etc.) via the ident method? DavorOcelic thinks it's OK.

4.8.3. References to how we do things now


4.9. SMTP

4.9.1. Decisions that we've agreed on

4.9.2. Questions to be resolved

  1. Run secondary MX on Dynamic or elsewhere?

4.9.3. References to how we do things now

UsingEmail, EmailConfiguration

4.10. Spam detection

4.10.1. Decisions that we've agreed on

4.10.2. References to how we do things now

UsingEmail, SpamAssassin, FeedingSpamAssassin, SpamAssassinAdmin

4.11. SSH

4.11.1. Decisions that we've agreed on

4.11.2. References to how we do things now


4.12. SIP Redirection

5. Services run on top of these daemons

5.1. Domtool

Everyone's favorite spiffy system for letting legions of users manage the same daemons securely.

AdamChlipala says:

JustinLeitgeb says:

5.1.1. References to how we do things now


5.2. Portal

5.2.1. Decisions that we've agreed on

5.2.2. References to how we do things now

The portal

5.3. Web e-mail client

5.3.1. Decisions that we've agreed on

5.3.2. References to how we do things now


5.4. Webmin/Usermin

5.4.1. Decisions that we've agreed on

5.4.2. References to how we do things now


5.5. Wiki

5.5.1. Decisions that we've agreed on

5.5.2. Questions to be resolved

5.5.3. References to how we do things now

This wiki

6. Security

Here are the security issues we need to worry about, sorting by resource categories of varying abstraction levels. What we mostly deal with here is avoiding negative consequences of actions by members with legitimate access to our servers.

6.1. CPU time

We haven't really encountered any trouble with this literal resource yet. However, potential problems come in when we're talking about user dynamic web site programs called by a shared Apache daemon. Apache allocates a fixed set of child processes, and each pending dynamic web site program takes up one child process for the duration of its life. Enough infinite-looping or slow CGI scripts can bring Apache down for everyone.

6.1.1. Current remedies

As per ResourceLimits, we use patched suexec programs to limit dynamic page generation programs to 10 seconds of running time. We also have a time-out for mod_proxy accesses, which we provide to allow members to implement dynamic web sites through their own daemons that the main Apache proxies.

6.2. Disk usage

We can't let one person use up all of the disk space, now can we?

6.2.1. Current remedies

We use group quotas so that members can be charged for files that they don't own. This is still hackish and allows some unintended behaviors. DaemonFileSecurity has more detail.

6.3. Network bandwidth

We don't do a thing to limit this now, since our current host provides significantly more bandwidth than we need.

6.3.1. Questions to be resolved

  1. Should we start doing anything beyond monitoring?

6.4. Network connection privileges

It's good to follow least privilege in who is allowed to connect to/listen on which ports.

6.4.1. Current remedies

We have a firewall system in place now. It uses a custom tool documented partially on FirewallRules.

6.5. Number of processes

Fork bombs are no fun, and many resource limiting schemes are per-process and so require a limit on process creation to be effective.

6.5.1. Current remedies

As per ResourceLimits, we use the nproc ulimit.

6.6. RAM

This is probably the most surprising thing for novices to the hosting co-op planning biz. If you would classify yourself as such, then I bet you would leave RAM off your list of resources that need to be protected with explicit security measures!

Nonetheless, it may just be the most critical resource to control. In our experiences back when everything ran on Abulafia, the most common cause of system outage was some user running an out-of-control process that allocated all available memory, causing other processes to drop dead left and right as memory allocation calls failed. We're letting people run their own daemons 24/7, so this just can't be ignored.

6.6.1. Current remedies

As per ResourceLimits, we use the as ulimit to put a cap on how much virtual memory a process can allocate.

CategorySystemAdministration CategoryHistorical

SoftwareArchitecturePlans (last edited 2018-04-22 01:34:40 by ClintonEbadi)