welcome: please sign in

Diff for "DomTool/Implementation"

Differences between revisions 1 and 2
Revision 1 as of 2006-12-16 23:56:27
Size: 3097
Editor: AdamChlipala
Comment:
Revision 2 as of 2006-12-17 00:20:31
Size: 6331
Editor: AdamChlipala
Comment: The language interpreter
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:
In the following sections, I'll often refer to SML modules by name, instead of giving source file paths. A module named `Name` will be defined in either `domtool2/name.sml` or `domtool2/plugins/name.sml`, depending on whether it's part of core DomTool or of a plugin. You'll also find signature `NAME` defined in `domtool2/name.sig` or `domtool2/plugins/name.sig`. I readily point the reader to the source code itself, and the signature files in particular, as the best sources of detailed documentation on the implementation. Readers coming from backgrounds outside of statically-typed functional programming may be pleasantly surprised at how well ML code documents itself!
Line 20: Line 22:

= The language interpreter =

The process of reading, checking, and running a DomTool source file goes like this:

 1. The lexer breaks the textual input into tokens. It's embodied by the `DomtoolLexFn` functor, built by ml-lex from `domtool.lex`.
 1. The parser converts the stream of tokens into an abstract syntax tree (AST). It's embodied by the `DomtoolLrValsFn` functor, built by ml-yacc from `domtool.grm`. The `Parse` module ties together the lexer and parser.
 1. The `Tycheck` module type-checks the AST.
 1. The `Reduce` module applies familiar lambda calculus-style reduction rules to simplify the AST as much as possible.
 1. For input files that request configuration rather than just add definitions, the `Eval` module executes the resulting configuration value.

Every piece of this pipeline is independent of the distributed configuration aspect of DomTool described on DomTool/ArchitectureOverview, though every stage after the parser provides hooks that can be used to conscript the language implementation for use in that and other applications.

One important hook of this kind in `Tycheck` is in the form of its members `allowExterns` and `disallowExterns`. Call the appropriate one of these functions to set whether or not `extern type` and `extern val` declarations should be allowed in the source file to check.

As DomTool/LanguageReference explains, all configuration takes place through the configuration monad, which has a lot in common with the [http://www.haskell.org/ Haskell] [http://www.nomaware.com/monads/html/iomonad.html IO monad]. Haskell newcomers often have trouble understanding how the IO monad enables the use of imperative code within a pure functional language. My favorite explanation for this is that values in the IO monad are runtime representations of programs in an embedded imperative language, which you hope will be run by some entity outside the scope of the Haskell language. In the DomTool implementation, this idea appears quite literally. The `Reduce` module handles the "pure functional" aspects of the language semantics, reducing input programs into first-order imperative programs, in the form of configuration values. `Eval` is the component that actually runs the resulting configuration, like the mythical top-level IO-meister in Haskell.

= Plugin architecture =

DomTool provides a number of ways to request callbacks when certain events occur or when certain add-on features are used. Plugins work by calling these hook functions, typically many times per plugin.

This page describes the implementation of the DomTool language interpreter and other tools. Most members would probably be better served visiting DomTool/UserGuide.

TableOfContents()

1. Languages

DomTool is implemented mostly in [http://en.wikipedia.org/wiki/Standard_ML Standard ML] (SML), with teeny tiny bits of C and shell script. Standard ML is a [http://en.wikipedia.org/wiki/Statically_typed statically-typed] [http://en.wikipedia.org/wiki/Functional_programming_language functional programming language] with much to recommend it, including a [http://portal.acm.org/citation.cfm?id=549659 language standard] (with formal semantics), [http://mlton.org/ one of the best open source optimizing compilers ever for any language], and open development models and communities associated with the major implementations (out of about 10 total language implementations floating around today).

But really, why choose a programming language that "nobody's ever heard of"? The answer is simple. With SML, you can program at a high level of abstraction without having to worry about performance penalties and other historical undesirables.

In the following sections, I'll often refer to SML modules by name, instead of giving source file paths. A module named Name will be defined in either domtool2/name.sml or domtool2/plugins/name.sml, depending on whether it's part of core DomTool or of a plugin. You'll also find signature NAME defined in domtool2/name.sig or domtool2/plugins/name.sig. I readily point the reader to the source code itself, and the signature files in particular, as the best sources of detailed documentation on the implementation. Readers coming from backgrounds outside of statically-typed functional programming may be pleasantly surprised at how well ML code documents itself!

Information about obtaining and building the DomTool tools is found on ["DomTool/Building"].

2. Configuration

As is more and more the fashion lately, DomTool supports many tweakable configuration variables, and the particular settings of those variables are conveyed via program source code. In particular, the various pieces of the DomTool implementation look for configuration in different members of a Config module in an SML source file config.sml in the domtool2 base directory. When building the standalone tools with MLton, these configuration settings will be inlined into the places where they're used in the resulting binary, possibly triggering opportunities for further optimization. Isn't compilation technology wonderful?

Any particular installation of DomTool is unlikely to want to set custom values for all or even most of the available variables. Thus, the implementation takes modest advantage of SML's module system to allow inheritance of default settings via the open declaration, while maintaining the possibility for piecemeal setting of custom values.

DomTool involves a number of distinct plugins and sources of functionality, all of which have some configuration parameters. The implementation uses Makefile-driven concatenation of files following a certain convention to build the overall default configuration module from files associated with the separate plugins. In particular, in domtool2/configDefault, you will find a set of .cfg, .cfs, and .csg files. All the .cfs files are concatenated together to form the definition of the signature CONFIG, while .csg files are concatenated together to form supporting definitions of sub-signatures. The .cfg files are concatenated together to form the definition of a structure ConfigDefault ascribing opaquely to CONFIG. Your custom configuration structure Config also ascribes to CONFIG and may open ConfigDefault.

3. The language interpreter

The process of reading, checking, and running a DomTool source file goes like this:

  1. The lexer breaks the textual input into tokens. It's embodied by the DomtoolLexFn functor, built by ml-lex from domtool.lex.

  2. The parser converts the stream of tokens into an abstract syntax tree (AST). It's embodied by the DomtoolLrValsFn functor, built by ml-yacc from domtool.grm. The Parse module ties together the lexer and parser.

  3. The Tycheck module type-checks the AST.

  4. The Reduce module applies familiar lambda calculus-style reduction rules to simplify the AST as much as possible.

  5. For input files that request configuration rather than just add definitions, the Eval module executes the resulting configuration value.

Every piece of this pipeline is independent of the distributed configuration aspect of DomTool described on DomTool/ArchitectureOverview, though every stage after the parser provides hooks that can be used to conscript the language implementation for use in that and other applications.

One important hook of this kind in Tycheck is in the form of its members allowExterns and disallowExterns. Call the appropriate one of these functions to set whether or not extern type and extern val declarations should be allowed in the source file to check.

As DomTool/LanguageReference explains, all configuration takes place through the configuration monad, which has a lot in common with the [http://www.haskell.org/ Haskell] [http://www.nomaware.com/monads/html/iomonad.html IO monad]. Haskell newcomers often have trouble understanding how the IO monad enables the use of imperative code within a pure functional language. My favorite explanation for this is that values in the IO monad are runtime representations of programs in an embedded imperative language, which you hope will be run by some entity outside the scope of the Haskell language. In the DomTool implementation, this idea appears quite literally. The Reduce module handles the "pure functional" aspects of the language semantics, reducing input programs into first-order imperative programs, in the form of configuration values. Eval is the component that actually runs the resulting configuration, like the mythical top-level IO-meister in Haskell.

4. Plugin architecture

DomTool provides a number of ways to request callbacks when certain events occur or when certain add-on features are used. Plugins work by calling these hook functions, typically many times per plugin.

DomTool/Implementation (last edited 2010-03-18 14:05:35 by AdamChlipala)