Java Development with Ant: Lucene

At the heart of this project is Lucene, an elegant powerful stable and scalable search engine.  Since the content is static in this project, the index is created during the build, and accessed during run-time as a read-only index.  The content could change without the application having to be redeployed or restarted - it would only require reindexing the content.

This application is not doing anything particularly sophisticated with Lucene, yet the results are quite powerful.  The run-time access of the Lucene API is trivial (see the SearchUtil code, keeping the Lucene API nicely wrapped), with the real fun stuff occurring during the build with the custom <index> task.  Here is its use in the build file:

    <target name="build-site-index" depends="package-anttask">
<taskdef resource="taskdef.properties"
classpath="${lucene.jar}:${dist.dir}/antbook-anttask.jar:${jtidy.jar}"
/>

<index index="${index.dir}"
overwrite="false">
<fileset dir="${site.dir}"/>
</index>

</target>

There is a default FileExtensionDocumentHandler that is used by the IndexTask which has handlers for processing .txt and .html files, but the task is pluggable allowing for a custom DocumentHandler implementing class to be used (set the fully qualified class name in the optional documenthandler attribute of the <index> task, and be sure to have that class in the <taskdef> classpath).  Peek into the source code to understand how this works in more detail.

The IndexTask source code has been contributed to the Lucene Sandbox and enhancements will be available there.

Query syntax

The documents indexed by the default FileExtensionDocumentHandler all have a "contents" field which is the default document field queried.  HTML documents that are indexed have an additional field, "title".  Its easy to use the Lucene query parser syntax to query documents with queries like "title:proxy", for example, or even more sophisticated combinations of terms and operators.