welcome: please sign in

DomTool / Implementation

This page describes the implementation of the DomTool language interpreter and other tools. Most members would probably be better served visiting DomTool/UserGuide.

1. Languages

DomTool is implemented mostly in Standard ML(SML), with teeny tiny bits of C and shell script. Standard ML is a statically typed functional programming language with much to recommend it, including a language standard (with formal semantics), one of the best open source optimizing compilers ever for any language, and open development models and communities associated with the major implementations (out of about 10 total language implementations floating around today).

But really, why choose a programming language that "nobody's ever heard of"? The answer is simple. With SML, you can program at a high level of abstraction without having to worry about performance penalties and other historical undesirables.

In the following sections, I'll often refer to SML modules by name, instead of giving source file paths. A module named Name will be defined in either domtool2/name.sml or domtool2/plugins/name.sml, depending on whether it's part of core DomTool or of a plugin. You'll also find signature NAME defined in domtool2/name.sig or domtool2/plugins/name.sig. I readily point the reader to the source code itself, and the signature files in particular, as the best sources of detailed documentation on the implementation. Readers coming from backgrounds outside of statically-typed functional programming may be pleasantly surprised at how well ML code documents itself!

Information about obtaining and building the DomTool tools is found on DomTool/Building.

2. Build process

MLton and SML/NJ take different inputs to drive their build processes. The main Makefile is responsible for building src/domtool.cm (the input to SML/NJ) and src/domtool-*.mlb (the tool-specific inputs to MLton) from src/sources and some other compiler-specific files. When adding a new source file to the system, include it in src/sources, not any of the generated files, and take care to insert it in dependency order relative to the sources already in the file.

A C library openssl_sml.so is built to provide a cleaner (but spartan) interface to the OpenSSL library. The Makefile uses the NLFFI tools shipped with MLton and SML/NJ to build compiler-specific SML interfaces to this library, and then compiler-agnostic code takes over and defines the visible OpenSSL structure based on the common interface supported by all NLFFI tools. Code specific to compiler $COMPILER lives in domtool2/openssl/$COMPILER.

3. Configuration

As is more and more the fashion lately, DomTool supports many tweakable configuration variables, and the particular settings of those variables are conveyed via program source code. In particular, the various pieces of the DomTool implementation look for configuration in different members of a Config module in an SML source file config.sml in the domtool2 base directory. When building the standalone tools with MLton, these configuration settings will be inlined into the places where they're used in the resulting binary, possibly triggering opportunities for further optimization. Isn't compilation technology wonderful?

Any particular installation of DomTool is unlikely to want to set custom values for all or even most of the available variables. Thus, the implementation takes modest advantage of SML's module system to allow inheritance of default settings via the open declaration, while maintaining the possibility for piecemeal setting of custom values.

DomTool involves a number of distinct plugins and sources of functionality, all of which have some configuration parameters. The implementation uses Makefile-driven concatenation of files following a certain convention to build the overall default configuration module from files associated with the separate plugins. In particular, in domtool2/configDefault, you will find a set of .cfg, .cfs, and .csg files. All the .cfs files are concatenated together to form the definition of the signature CONFIG, while .csg files are concatenated together to form supporting definitions of sub-signatures. The .cfg files are concatenated together to form the definition of a structure ConfigDefault ascribing opaquely to CONFIG. Your custom configuration structure Config also ascribes to CONFIG and may open ConfigDefault.

4. The language interpreter

The process of reading, checking, and running a DomTool source file goes like this:

  1. The lexer breaks the textual input into tokens. It's embodied by the DomtoolLexFn functor, built by ml-lex from domtool.lex.

  2. The parser converts the stream of tokens into an abstract syntax tree (AST). It's embodied by the DomtoolLrValsFn functor, built by ml-yacc from domtool.grm. The Parse module ties together the lexer and parser.

  3. The Tycheck module type-checks the AST.

  4. The Reduce module applies familiar lambda calculus-style reduction rules to simplify the AST as much as possible.

  5. For input files that request configuration rather than just add definitions, the Eval module executes the resulting configuration value.

Every piece of this pipeline is independent of the distributed configuration aspect of DomTool described on DomTool/ArchitectureOverview, though every stage after the parser provides hooks that can be used to conscript the language implementation for use in that and other applications.

One important hook of this kind in Tycheck is in the form of its members allowExterns and disallowExterns. Call the appropriate one of these functions to set whether or not extern type and extern val declarations should be allowed in the source file to check.

As DomTool/LanguageReference explains, all configuration takes place through the configuration monad, which has a lot in common with the Haskell IO monad. Haskell newcomers often have trouble understanding how the IO monad enables the use of imperative code within a pure functional language. My favorite explanation for this is that values in the IO monad are runtime representations of programs in an embedded imperative language, which you hope will be run by some entity outside the scope of the Haskell language. In the DomTool implementation, this idea appears quite literally. The Reduce module handles the "pure functional" aspects of the language semantics, reducing input programs into first-order imperative programs, in the form of configuration values. Eval is the component that actually runs the resulting configuration, like the mythical top-level IO-meister in Haskell.

5. Plugin architecture

DomTool provides a number of ways to request callbacks when certain events occur or when certain add-on features are used. Plugins work by calling these hook functions, typically many times per plugin. By convention, a plugin is a module defined in domtool2/src/plugins/ that registers some callbacks as a side-effect of its definition.

The following subsections summarize the hooks that are available for DomTool plugins. There are other hooks that are only of interest when using the DomTool language implementation in a different application.

5.1. Extern functions

Declared extern val functions can be implemented in two different ways. One hardly counts as implementation: you can leave them unimplemented and just treat them as purely syntactic entities, since some of the later callbacks that we'll cover are passed general DomTool ASTs as arguments. The second option is to register an extern function handler. Env.registerFunction is the hook for this.

5.2. Actions

Actions are the connection between functional DomTool programs and "real-world" configuration. Call Env.registerAction to register the actual code that should be run when an action is encountered during Eval, giving the action's name and a function for transforming an environment variable mapping and a list of argument ASTs into a new environment variable mapping. These are DomTool, not UNIX, environment variables.

There is a family of convenience functions Env.action_none, Env.action_one, etc., for registering actions taking argument lists of fixed length with known types. Values of type Env.arg are used to encapsulate methods for extracting native SML values from DomTool ASTs of known types.

5.3. Containers

Containers are actions that take actions as additional arguments, like domain and vhost. Their handlers are registered very similarly to other actions, with the addition that containers have associated callbacks that are run after all nested configuration has been processed. When a container is encountered during Eval, its action handler is run, then all of its nested configuration is evaluated, and finally the container's "afterward" callback is run. There are functions Env.container_none, Env.container_one, etc., that correspond to the convenience functions for regular actions.

5.4. Extern types

Types declared with extern type are treated as refinement types. That is, each should have an associated simple type to which an additional filtering predicate is applied. Env.type_one is the hook to register a new extern type by giving its name, an Env.arg for converting its values to native SML, and a boolean predicate for deciding which values of the base type are allowed in the new type. This predicate can be arbitrary SML code. It may rely on imperativity, but it should never be visibly inconsistent in its decisions within a single type-checking. For example, our use of the DomTool language for distributed configuration has extern type handlers that use imperativity to determine the current user, what domains he may configure, etc., but this information is set before type-checking begins and doesn't change until it's over.

5.5. Environment variable defaults

Call Defaults.registerDefault to provide a default value for an environment variable that should be set before type-checking begins. You must provide the variable's name, its type, and a (possibly impure) function for generating its initial expression value.

5.6. Reset handlers

When an admin runs domtool-admin regen, we need a way to revert to a pristine configuration where everything users have added is gone, before we build it all back up again from scratch. Domain.registerResetGlobal registers a function to perform this clean-up on global (i.e., AFS) configuration, while Domain.registerResetLocal registers a similar function to be run on each node before regeneration. For example, the Webalizer plugin uses registerResetGlobal to delete all Webalizer configuration files, and the Apache plugin uses registerResetLocal to clear the contents of /var/domtool/vhosts.

5.7. Before/after domains

Call Domain.registerBefore and Domain.registerAfter to register callbacks to be called before and after a domain directive's nested configuration is run.

5.8. File change handlers

Call Slave.registerFileHandler to register a callback to call whenever a file's status in $DOMTOOL/nodes changes. See DomTool/ArchitectureOverview for more information on when such callbacks would be triggered.

5.9. Pre/post-handlers

Call Slave.registerPreHandler and Slave.registerPostHandler to register functions to be called before and after a DomTool configuration session, which might include arbitrarily many domains and source files.

DomTool/Implementation (last edited 2010-03-18 14:05:35 by AdamChlipala)