The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WE_Frontend::Indexer::Htdig - interface to the htdig search engine

SYNOPSIS

    use WE_Frontend::Indexer::Htdig;
    my $results = WE_Frontend::Indexer::Htdig::search(-words => "word");

DESCRIPTION

This is an interface to the htdig search engine. The result of the search function call is a perl hash reference containing the results.

FUNCTIONS

search(%args)

Arguments are:

-words

A string with the words to search. Multiple words are space-separated. This argument is required.

-conf

Specify a different htdig configuration file, otherwise the default htdig.conf is used.

-lang

(Optional) Specify a language. The configuration parameter given by conf may contain %{lang} placeholders which are substituted by the value of this argument.

-debug

Output some diagnostics to stderr.

-httpshack

Set to a true value if operating on a https server. htdig does not handle SSL, so a parallel http should be setup for the indexing. With the https hack the URLs in the search result list are translated at template display time.

The result is a hash reference with the following keys:

logical_words
matches_per_page
max_stars
page
pages
list

Holds an array with the search results. See below.

nomatch

This variable is set to a true value if the search produces no results. Also detectable by an empty result list.

pageurllist

A list of URLs for the 1 .. 10 result pages.

pagenumberlist

The corresponding numbers for the pageurllist. Please note that perl/Template arrays start with index 0 (which would be page 1).

prevpageurl
nextpageurl

Hold the URLs for the previous resp. next result page.

prevpagenumber
nextpagenumber

Usually not needed: the number of the previous resp. next result page. In fact you would label them "Prev"/"Next" or "<"/">".

...

There are more keys. For a complete list refer to the htdig documentation at http://www.htdig.org, htsearch, Templates. Note that the original template variable names are converted to lowercase.

The value of list is an array reference with the matches. Each match is a hash reference with the following keys:

url

The URL of the page. See also the -httpshack option above.

title

The title of the page, as specified by the <title> html tag.

anchor
excerpt

The first lines of text in the document.

score
percent
modified

The date and time the document was last modified. See also the documentation of the iso_8601 config variable in htdig.conf.

...

The complete list is also in the htdig documentation at http://www.htdig.org, htsearch, Templates.

CONFIGURATION FILES

It is best to just use the original conf/htdig.tpl.conf file found in the webeditor distribution. The indexing program in webeditor will use the template file and fill it with the configuration found in WEsiteinfo. Please look also into htdig.txt in the webeditor/doc directory for a first-time installation/configuration.

WEsiteinfo configuration:

To override the searchindexer path (default is "rundig" without a path):

    $searchengine->searchindexer("/usr/local/bin/rundig");

To set the template htdig and target htdig configuration files (these settings are highly recommended):

    $searchengine->htdigconftemplate($paths->uprootdir . "/conf/htdig.tpl.conf");
    $searchengine->htdigconf($paths->uprootdir . "/conf/htdig.%{lang}.conf");

where $paths is the WEsiteinfo::Paths object documented in WE_Frontend::Info. If the configuration file should not be language dependent, then use

    $searchengine->htdigconf($paths->uprootdir . "/conf/htdig.conf");

instead.

Own htdig.conf

If you decide to make your own htdig.conf, put at least the following lines into the configuration file:

    template_map: Long long ${common_dir}/long.html \
                  Short short ${common_dir}/short.html \
                  Perl perl ${common_dir}/perl/match.pl
    template_name: perl
    search_results_header: ${common_dir}/perl/header.pl
    search_results_footer: ${common_dir}/perl/footer.pl
    nothing_found_file:    ${common_dir}/perl/nomatch.pl

${common_dir}/perl should be a link to the directory .../lib/WE_Frontend/Indexer/htdig_common.

INSTALLING HTDIG

htdig is available e.g. from this location: http://www.htdig.org/files/snapshots/htdig-3.2.0b5-20040404.tar.gz.

To compile and install htdig from scratch, the following configure line could be used to create a path layout similar to the RedHat one:

    sh configure --prefix=/usr --with-search-dir=/usr/share/htdig --with-image-dir=/usr/share/htdig --with-cgi-bin-dir=/usr/bin --with-config-dir=/etc --with-database-dir=/usr/share/htdig

CAVEATS

Many. Mind the permissions. Especially, rundig may use the default database directory (/usr/local/share/htdig/database or such) as the temporary directory for sorting, which will fail if the apache user (usually nobody or www) has no permissions to write to this directory. In this case change the TMPDIR definition in rundir or set appropriate write permissions.

AUTHOR

Slaven Rezic - slaven@rezic.de

SEE ALSO

htdig(1).