Java Development with Ant: Lucene
At the heart of this project is Lucene, an elegant powerful stable
and scalable search engine. Since the content is static in this project,
the index is created during the build, and accessed during run-time as a
read-only index. The content could change without the application having
to be redeployed or restarted - it would only require reindexing the content.
This application is not doing anything particularly sophisticated with Lucene,
yet the results are quite powerful. The run-time access of the Lucene
API is trivial (see the SearchUtil
code, keeping the Lucene API nicely wrapped), with the real fun stuff occurring
during the build with the custom <index> task. Here is its use
in the build file:
<target name="build-site-index" depends="package-anttask">
<taskdef resource="taskdef.properties"
classpath="${lucene.jar}:${dist.dir}/antbook-anttask.jar:${jtidy.jar}"
/>
<index index="${index.dir}"
overwrite="false">
<fileset dir="${site.dir}"/>
</index>
</target>
There is a default FileExtensionDocumentHandler
that is used by the IndexTask
which has handlers for processing .txt and .html files, but the task is pluggable
allowing for a custom DocumentHandler
implementing class to be used (set the fully qualified class name in the optional
documenthandler attribute of the <index> task, and be sure to have
that class in the <taskdef> classpath). Peek into the source
code to understand how this works in more detail.
The IndexTask source code has been contributed to the Lucene Sandbox
and enhancements will be available there.
Query syntax
The documents indexed by the default FileExtensionDocumentHandler
all have a "contents" field which is the default document field queried. HTML
documents that are indexed have an additional field, "title". Its easy
to use the Lucene query
parser syntax to query documents with queries like "title:proxy", for
example, or even more sophisticated combinations of terms and operators.