The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Sphinx::Search - Sphinx search engine API Perl client

VERSION

Please note that you *MUST* install a version which is compatible with your version of Sphinx.

This version is 0.12

Use version 0.12 for Sphinx 0.9.8 and later

Use version 0.11 for Sphinx 0.9.8-rc1 and later

Use version 0.10 for Sphinx 0.9.8-svn-r1112

Use version 0.09 for Sphinx 0.9.8-svn-r985

Use version 0.08 for Sphinx 0.9.8-svn-r871

Use version 0.06 for Sphinx 0.9.8-svn-r820

Use version 0.05 for Sphinx 0.9.8-cvs-20070907

Use version 0.02 for Sphinx 0.9.8-cvs-20070818

SYNOPSIS

    use Sphinx::Search;

    $sphinx = Sphinx::Search->new();

    $results = $sphinx->SetMatchMode(SPH_MATCH_ALL)
                      ->SetSortMode(SPH_SORT_RELEVANCE)
                      ->Query("search terms");

DESCRIPTION

This is the Perl API client for the Sphinx open-source SQL full-text indexing search engine, http://www.sphinxsearch.com.

CONSTRUCTOR

new

    $sph = Sphinx::Search->new;
    $sph = Sphinx::Search->new(\%options);

Create a new Sphinx::Search instance.

OPTIONS

log

Specify an optional logger instance. This can be any class that provides error, warn, info, and debug methods (e.g. see Log::Log4perl). Logging is disabled if no logger instance is provided.

debug

Debug flag. If set (and a logger instance is specified), debugging messages will be generated.

METHODS

GetLastError

    $error = $sph->GetLastError;

Get last error message (string)

GetLastWarning

    $warning = $sph->GetLastWarning;

Get last warning message (string)

SetServer

    $sph->SetServer($host, $port);

Set the host (string) and port (integer) details for the searchd server. Returns $sph.

SetConnectTimeout

    $sph->SetConnectTimeout($timeout)

Set server connection timeout (in seconds).

Returns $sph.

SetLimits

    $sph->SetLimits($offset, $limit);
    $sph->SetLimits($offset, $limit, $max);

Set match offset/limits, and optionally the max number of matches to return.

Returns $sph.

SetMaxQueryTime

    $sph->SetMaxQueryTime($millisec);

Set maximum query time, in milliseconds, per index.

The value may not be negative; 0 means "do not limit".

Returns $sph.

SetMatchMode

    $sph->SetMatchMode($mode);

Set match mode, which may be one of:

  • SPH_MATCH_ALL

    Match all words

  • SPH_MATCH_ANY

    Match any words

  • SPH_MATCH_PHRASE

    Exact phrase match

  • SPH_MATCH_BOOLEAN

    Boolean match, using AND (&), OR (|), NOT (!,-) and parenthetic grouping.

  • SPH_MATCH_EXTENDED

    Extended match, which includes the Boolean syntax plus field, phrase and proximity operators.

Returns $sph.

SetRankingMode

    $sph->SetRankingMode(SPH_RANK_BM25);

Set ranking mode, which may be one of:

  • SPH_RANK_PROXIMITY_BM25

    Default mode, phrase proximity major factor and BM25 minor one

  • SPH_RANK_BM25

    Statistical mode, BM25 ranking only (faster but worse quality)

  • SPH_RANK_NONE

    No ranking, all matches get a weight of 1

  • SPH_RANK_WORDCOUNT

    Simple word-count weighting, rank is a weighted sum of per-field keyword occurence counts

Returns $sph.

SetSortMode

    $sph->SetSortMode(SPH_SORT_RELEVANCE);
    $sph->SetSortMode($mode, $sortby);

Set sort mode, which may be any of:

SPH_SORT_RELEVANCE - sort by relevance
SPH_SORT_ATTR_DESC, SPH_SORT_ATTR_ASC

Sort by attribute descending/ascending. $sortby specifies the sorting attribute.

SPH_SORT_TIME_SEGMENTS

Sort by time segments (last hour/day/week/month) in descending order, and then by relevance in descending order. $sortby specifies the time attribute.

SPH_SORT_EXTENDED

Sort by SQL-like syntax. $sortby is the sorting specification.

SPH_SORT_EXPR

Returns $sph.

SetWeights

    $sph->SetWeights([ 1, 2, 3, 4]);

This method is deprecated. Use SetFieldWeights instead.

Set per-field (integer) weights. The ordering of the weights correspond to the ordering of fields as indexed.

Returns $sph.

SetFieldWeights

    $sph->SetFieldWeights(\%weights);

Set per-field (integer) weights by field name. The weights hash provides field name to weight mappings.

Takes precedence over SetWeights.

Unknown names will be silently ignored. Missing fields will be given a weight of 1.

Returns $sph.

SetIndexWeights

    $sph->SetIndexWeights(\%weights);

Set per-index (integer) weights. The weights hash is a mapping of index name to integer weight.

Returns $sph.

SetIDRange

    $sph->SetIDRange($min, $max);

Set IDs range only match those records where document ID is between $min and $max (including $min and $max)

Returns $sph.

SetFilter

    $sph->SetFilter($attr, \@values);
    $sph->SetFilter($attr, \@values, $exclude);

Sets the results to be filtered on the given attribute. Only results which have attributes matching the given (numeric) values will be returned.

This may be called multiple times with different attributes to select on multiple attributes.

If 'exclude' is set, excludes results that match the filter.

Returns $sph.

SetFilterRange

    $sph->SetFilterRange($attr, $min, $max);
    $sph->SetFilterRange($attr, $min, $max, $exclude);

Sets the results to be filtered on a range of values for the given attribute. Only those records where $attr column value is between $min and $max (including $min and $max) will be returned.

$min and $max must be integers. Use SetFilterFloatRange for floating point values.

If 'exclude' is set, excludes results that fall within the given range.

Returns $sph.

SetFilterFloatRange

    $sph->SetFilterFloatRange($attr, $min, $max, $exclude);

Same as SetFilterRange, but allows floating point values.

Returns $sph.

SetGeoAnchor

    $sph->SetGeoAnchor($attrlat, $attrlong, $lat, $long);

Setup anchor point for using geosphere distance calculations in filters and sorting. Distance will be computed with respect to this point

$attrlat is the name of latitude attribute
$attrlong is the name of longitude attribute
$lat is anchor point latitude, in radians
$long is anchor point longitude, in radians

Returns $sph.

SetGroupBy

    $sph->SetGroupBy($attr, $func);
    $sph->SetGroupBy($attr, $func, $groupsort);

Sets attribute and function of results grouping.

In grouping mode, all matches are assigned to different groups based on grouping function value. Each group keeps track of the total match count, and the best match (in this group) according to current sorting function. The final result set contains one best match per group, with grouping function value and matches count attached.

$attr is any valid attribute. Use ResetGroupBy to disable grouping.

$func is one of:

  • SPH_GROUPBY_DAY

    Group by day (assumes timestamp type attribute of form YYYYMMDD)

  • SPH_GROUPBY_WEEK

    Group by week (assumes timestamp type attribute of form YYYYNNN)

  • SPH_GROUPBY_MONTH

    Group by month (assumes timestamp type attribute of form YYYYMM)

  • SPH_GROUPBY_YEAR

    Group by year (assumes timestamp type attribute of form YYYY)

  • SPH_GROUPBY_ATTR

    Group by attribute value

  • SPH_GROUPBY_ATTRPAIR

    Group by two attributes, being the given attribute and the attribute that immediately follows it in the sequence of indexed attributes. The specified attribute may therefore not be the last of the indexed attributes.

Groups in the set of results can be sorted by any SQL-like sorting clause, including both document attributes and the following special internal Sphinx attributes:

@id - document ID;
@weight, @rank, @relevance - match weight;
@group - group by function value;
@count - number of matches in group.

The default mode is to sort by groupby value in descending order, ie. by "@group desc".

In the results set, "total_found" contains the total amount of matching groups over the whole index.

WARNING: grouping is done in fixed memory and thus its results are only approximate; so there might be more groups reported in total_found than actually present. @count might also be underestimated.

For example, if sorting by relevance and grouping by a "published" attribute with SPH_GROUPBY_DAY function, then the result set will contain only the most relevant match for each day when there were any matches published, with day number and per-day match count attached, and sorted by day number in descending order (ie. recent days first).

SetGroupDistinct

    $sph->SetGroupDistinct($attr);

Set count-distinct attribute for group-by queries

SetRetries

    $sph->SetRetries($count, $delay);

Set distributed retries count and delay

ResetFilters

    $sph->ResetFilters;

Clear all filters.

ResetGroupBy

    $sph->ResetGroupBy;

Clear all group-by settings (for multi-queries)

Query

    $results = $sph->Query($query, $index);

Connect to searchd server and run given search query.

query is query string
index is index name to query, default is "*" which means to query all indexes. Use a space or comma separated list to search multiple indexes.

Returns undef on failure

Returns hash which has the following keys on success:

matches

Array containing hashes with found documents ( "doc", "weight", "group", "stamp" )

total

Total amount of matches retrieved (upto SPH_MAX_MATCHES, see sphinx.h)

total_found

Total amount of matching documents in index

time

Search time

words

Hash which maps query terms (stemmed!) to ( "docs", "hits" ) hash

Returns the results array on success, undef on error.

AddQuery

   $sph->AddQuery($query, $index);

Add a query to a batch request.

Batch queries enable searchd to perform internal optimizations, if possible; and reduce network connection overheads in all cases.

For instance, running exactly the same query with different groupby settings will enable searched to perform expensive full-text search and ranking operation only once, but compute multiple groupby results from its output.

Parameters are exactly the same as in Query() call.

Returns corresponding index to the results array returned by RunQueries() call.

RunQueries

    $sph->RunQueries

Run batch of queries, as added by AddQuery.

Returns undef on network IO failure.

Returns an array of result sets on success.

Each result set in the returned array is a hash which contains the same keys as the hash returned by Query, plus:

  • error

    Errors, if any, for this query.

  • warnings

    Any warnings associated with the query.

BuildExcerpts

    $excerpts = $sph->BuildExcerpts($docs, $index, $words, $opts)

Generate document excerpts for the specified documents.

docs

An array reference of strings which represent the document contents

index

A string specifiying the index whose settings will be used for stemming, lexing and case folding

words

A string which contains the words to highlight

opts

A hash which contains additional optional highlighting parameters:

before_match - a string to insert before a set of matching words, default is "<b>"
after_match - a string to insert after a set of matching words, default is "<b>"
chunk_separator - a string to insert between excerpts chunks, default is " ... "
limit - max excerpt size in symbols (codepoints), default is 256
around - how many words to highlight around each match, default is 5
exact_phrase - whether to highlight exact phrase matches only, default is false
single_passage - whether to extract single best passage only, default is false
use_boundaries
weight_order

Returns undef on failure.

Returns an array of string excerpts on success.

BuildKeywords

    $results = $sph->BuildKeywords($query, $index, $hits)

Generate keyword list for a given query Returns undef on failure, Returns an array of hashes, where each hash describes a word in the query with the following keys:

  • tokenized

    Tokenised term from query

  • normalized

    Normalised term from query

  • docs

    Number of docs in which word was found (if $hits is true)

  • hits

    Number of occurrences of word (if $hits is true)

EscapeString

    $escaped = $sph->EscapeString('abcde!@#$%')

Inserts backslash before all non-word characters in the given string.

UpdateAttributes

    $sph->UpdateAttributes($index, \@attrs, \%values);

Update specified attributes on specified documents

index

Name of the index to be updated

attrs

Array of attribute name strings

values

A hash with key as document id, value as an array of new attribute values

Returns number of actually updated documents (0 or more) on success

Returns undef on failure

Usage example:

 $sph->UpdateAttributes("test1", [ qw/group_id/ ], { 1 => [ 456] }) );

SEE ALSO

http://www.sphinxsearch.com

NOTES

There is a bundled Sphinx.pm in the contrib area of the Sphinx source distribution, which was used as the starting point of Sphinx::Search. Maintenance of that version appears to have lapsed at sphinx-0.9.7, so many of the newer API calls are not available there. Sphinx::Search is mostly compatible with the old Sphinx.pm except:

On failure, Sphinx::Search returns undef rather than 0 or -1.
Sphinx::Search 'Set' functions are cascadable, e.g. you can do Sphinx::Search->new ->SetMatchMode(SPH_MATCH_ALL) ->SetSortMode(SPH_SORT_RELEVANCE) ->Query("search terms")

Sphinx::Search also provides documentation and unit tests, which were the main motivations for branching from the earlier work.

AUTHOR

Jon Schutz

BUGS

Please report any bugs or feature requests to bug-sphinx-search at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Sphinx-Search. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Sphinx::Search

You can also look for information at:

ACKNOWLEDGEMENTS

This module is based on Sphinx.pm (not deployed to CPAN) for Sphinx version 0.9.7-rc1, by Len Kranendonk, which was in turn based on the Sphinx PHP API.

COPYRIGHT & LICENSE

Copyright 2007 Jon Schutz, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License.