<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>HelpOnXapian</title></articleinfo><para>Using Xapian you can dramatically improve the performance of searching in moin and furthermore unlock some more features (see the search prefixes above) not possible with the legacy search engine. </para><section><title>Setting it up</title><section><title>Requirements</title><para>You must have Xapian itself and its Python bindings (xapian-core and xapian-bindings) from <ulink url="http://www.xapian.org/"/> at least in version 1.0.0 installed. </para><para>To process attachment files, moin uses <code>filter</code> plugins - here is the list of filter plugins included: </para><informaltable><tgroup cols="3"><colspec colname="col_0"/><colspec colname="col_1"/><colspec colname="col_2"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para> <emphasis role="strong">File type</emphasis> </para></entry><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Dependency</emphasis> </para></entry><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Notes</emphasis> </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Text files (.txt) </para></entry><entry colsep="1" rowsep="1"><para> - </para></entry><entry colsep="1" rowsep="1"><para> tries utf-8 and iso-8859-15 encodings (or forces to ASCII if those do not work) </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> JPEG images (.jpg) </para></entry><entry colsep="1" rowsep="1"><para> - </para></entry><entry colsep="1" rowsep="1"><para> EXIF data is extracted </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Open Office files (.sx?) </para></entry><entry colsep="1" rowsep="1"><para> - </para></entry><entry colsep="1" rowsep="1"><para> e.g. from older <code>OpenOffice.org/StarOffice</code> versions </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Open Document files (.od?) </para></entry><entry colsep="1" rowsep="1"><para> - </para></entry><entry colsep="1" rowsep="1"><para> e.g. from recent <code>OpenOffice.org/StarOffice</code> versions </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Binary files </para></entry><entry colsep="1" rowsep="1"><para> - </para></entry><entry colsep="1" rowsep="1"><para> moin uses a <code>strings</code> like filter to process those, as well as a blacklist with stuff you don't want to search </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> MS Word files (.doc) </para></entry><entry colsep="1" rowsep="1"><para> antiword </para></entry><entry colsep="1" rowsep="1"><para> filter calls <code>antiword</code> </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> MS Excel files (.xls) </para></entry><entry colsep="1" rowsep="1"><para> catdoc </para></entry><entry colsep="1" rowsep="1"><para> filter calls <code>xls2csv</code> </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> PDF files (.pdf) </para></entry><entry colsep="1" rowsep="1"><para> xpdf-utils </para></entry><entry colsep="1" rowsep="1"><para> filter calls <code>pdttotext</code> </para></entry></row></tbody></tgroup></informaltable><para>After installing additional filters (or dependencies) you should (re)build your index. Xapian will find the new filters / support packages automagically. The next time your search results may contain results linking directly to your attachments.  </para></section><section><title>Configuration</title><para>In your wikiconfig, you have several options on how to configure Xapian: </para><informaltable><tgroup cols="3"><colspec colname="col_0"/><colspec colname="col_1"/><colspec colname="col_2"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para> xapian_search </para></entry><entry colsep="1" rowsep="1"><para> <code>False</code> </para></entry><entry colsep="1" rowsep="1"><para> if True, enables Xapian search </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> xapian_index_dir </para></entry><entry colsep="1" rowsep="1"><para> <code>None</code> </para></entry><entry colsep="1" rowsep="1"><para> if set, set and use a separate index directory for every wiki distinguished by wikiname; useful for wikifarms to seperate indices (note: needs rebuilding the index) </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> xapian_index_history </para></entry><entry colsep="1" rowsep="1"><para> <code>True</code> </para></entry><entry colsep="1" rowsep="1"><para> if True, it will instruct the indexer to index all revisions of a page to let users search in their history (note: needs rebuilding the index) </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> xapian_stemming </para></entry><entry colsep="1" rowsep="1"><para> <code>False</code> </para></entry><entry colsep="1" rowsep="1"><para> if True, enables stemming of terms in Xapian (note: needs rebuilding the index) </para></entry></row></tbody></tgroup></informaltable><itemizedlist><listitem><para><emphasis role="strong">xapian_search</emphasis> (default: False) </para><itemizedlist><listitem override="none"><para>Setting this to True, enables Xapian search for your <ulink url="https://wiki.hcoop.net/HelpOnXapian/MoinMoin#">MoinMoin</ulink> wiki. </para><para><inlinemediaobject><imageobject><imagedata depth="16" fileref="https://wiki.hcoop.net/moin_static1911/moniker_bt/img/idea.png" width="16"/></imageobject><textobject><phrase>(!)</phrase></textobject></inlinemediaobject> Moin will auto disable xapian_search (and fall back to slow search) if it doesn't find a usable index. You can see whether it uses Xapian on <ulink url="https://wiki.hcoop.net/HelpOnXapian/SystemInfo#">SystemInfo</ulink>. </para></listitem></itemizedlist></listitem><listitem><para><emphasis role="strong">xapian_index_history</emphasis> (default: False) </para><itemizedlist><listitem override="none"><para>If this option is enabled, all revisions of all pages (except underlay, of which only one revision is available) are indexed. This allows users to search in older revisions of pages if enabled in the search dialogue on <ulink url="https://wiki.hcoop.net/HelpOnXapian/FindPage#">FindPage</ulink>. <inlinemediaobject><imageobject><imagedata depth="16" fileref="https://wiki.hcoop.net/moin_static1911/moniker_bt/img/alert.png" width="16"/></imageobject><textobject><phrase>/!\</phrase></textobject></inlinemediaobject> You need to rebuild your index if you change this option. Also check the size of your index after build if you're running a big wiki as this feature can eat up a lot of disk space. Creating the index might take rather long, if indexing history is enabled. </para></listitem></itemizedlist></listitem><listitem><para><emphasis role="strong">xapian_index_dir</emphasis> (default: None) </para><itemizedlist><listitem override="none"><para>This option lets you specifiy a separate directory to save your index to. Initially, it gets saved to <emphasis>data_dir</emphasis>/cache/xapian/. Furthermore, if this option is used, every wiki on a wikifarm gets its own index identified by it's wikiname as opposed to a single index in standard configuration. Set this option if running a wikifarm! <inlinemediaobject><imageobject><imagedata depth="16" fileref="https://wiki.hcoop.net/moin_static1911/moniker_bt/img/alert.png" width="16"/></imageobject><textobject><phrase>/!\</phrase></textobject></inlinemediaobject> Don't forget to (re-)build an/the index/indices after enabling this! </para></listitem></itemizedlist></listitem><listitem><para><emphasis role="strong">xapian_stemming</emphasis> (default: True) </para><itemizedlist><listitem override="none"><para>If enabled, words will be indexed in their raw and stemmed forms and terms in your search query are stemmed in your language. This means that searching for &quot;testing&quot; will also yield pages containing the words &quot;tested&quot;, &quot;tester&quot; etc. </para><para><inlinemediaobject><imageobject><imagedata depth="16" fileref="https://wiki.hcoop.net/moin_static1911/moniker_bt/img/alert.png" width="16"/></imageobject><textobject><phrase>/!\</phrase></textobject></inlinemediaobject> Enabling/disabling this option needs a complete rebuild of your index! </para></listitem></itemizedlist></listitem></itemizedlist></section><section><title>(Re-)Building an index</title><para>You can use the supplied command line tool <emphasis>moin</emphasis> to initially build, completely rebuild and update an existing index. </para><para>To build your index the first time, execute </para><screen><![CDATA[moin --config-dir=/where/your/configdir/is --wiki-url=wiki-url/ index build --mode=add]]></screen><para>in your command line. You can check the status of Xapian and its index on <ulink url="https://wiki.hcoop.net/HelpOnXapian/SystemInfo#">SystemInfo</ulink>. </para><para>Moreover, the following modes can be passed to the command above to control the building of the index: </para><itemizedlist><listitem><para><emphasis role="strong">add</emphasis> </para><itemizedlist><listitem override="none"><para>Items are added without checking if they are already in the index. Only use this mode if you haven't got an index already. </para></listitem></itemizedlist></listitem><listitem><para><emphasis role="strong">rebuild</emphasis> </para><itemizedlist><listitem override="none"><para>Before going into <emphasis>add</emphasis> mode, this option deletes the previous index, if existing. </para></listitem></itemizedlist></listitem><listitem><para><emphasis role="strong">update</emphasis> </para><itemizedlist><listitem override="none"><para>Updates every page in the index based on it's last modification date. </para><para><inlinemediaobject><imageobject><imagedata depth="16" fileref="https://wiki.hcoop.net/moin_static1911/moniker_bt/img/idea.png" width="16"/></imageobject><textobject><phrase>(!)</phrase></textobject></inlinemediaobject> Periodic invocations using this mode are <emphasis role="strong">not</emphasis> necessary as pages in the index will of course be updated upon change. This should only be used for debugging purposes if pages in the index are not up-to-date. </para></listitem></itemizedlist></listitem></itemizedlist><para><inlinemediaobject><imageobject><imagedata depth="16" fileref="https://wiki.hcoop.net/moin_static1911/moniker_bt/img/alert.png" width="16"/></imageobject><textobject><phrase>/!\</phrase></textobject></inlinemediaobject> Please note that you <emphasis role="strong">must</emphasis> <emphasis>rebuild</emphasis> your index if you change at least one of xapian_index_history, xapian_index_dir or xapian_stemming configuration options! </para></section><section><title>Testing</title><para>You can test if Xapian is enabled and if an index is available by checking <ulink url="https://wiki.hcoop.net/HelpOnXapian/SystemInfo#">SystemInfo</ulink>. To check if searches are performed using Xapian, enable show_timings in your wikiconfig, perform a search and look for _xapianSearch on the bottom of the page. </para></section></section><section><title>Usage</title><para>Xapian is basically used the same way as all other search engines. Due to Xapian's advanced features some new search term prefixed were introduced which are not already available in the legacy search engine (commonly referred to as moin search). See <ulink url="https://wiki.hcoop.net/HelpOnXapian/HelpOnSearching#">HelpOnSearching</ulink> for more information and/or use the new advanced search dialogue available on <ulink url="https://wiki.hcoop.net/HelpOnXapian/FindPage#">FindPage</ulink> to see what's available and possible. </para></section></article>