The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

RePrec::Searchresult - Parse search result for evaluation purposes

SYNOPSIS

  require RePrec::Searchresult;

DESCRIPTION

To do an evaluation of effectiveness of information retrieval methods one needs to parse the results of a query run. From a ranking of documents one needs to filter out the document IDs (DOCIDs) and their respective ranks or retrieval status values (RSVs). Since rank and RSV provide for equivalent information only one of them is needed. The RePrec::Searchresult class provides for means to do so which should suit for most formats of search results. In case it doesn't suit one can subclass this class.

METHODS

$result = Searchresult->new($query, $result)

where $query is the ID of the query under consideration and where $result is an array reference holding array references containing each a (RSV, DOCID) pair. These pairs must be sorted by decreasing RSV.

$result = Searchresult->new($query, $file, %parms)

where $file is the name of the file containing the results. This file is parsed then in order to extract DocIDs and ranks or RSVs. The constructor calls the private method _init (with $file and %parms as arguments) in order to do the parsing. The argument %parms is described within the documentation of that method.

$result->_init($file, %parms)

The file parsing method, which should be the only method to replace in subclasses of RePrec::Searchresult. Within this baseclass it is assumed that the data in $file comes as an table, with each row containing RSV/rank of a single document. Argument %parms keep the following parameters (defaults are given in parens):

separator ('\s+')

perl regular expression separating columns

docid (0)

column which holds the DOCIDs (index of first column is 0)

rsv (1)

column which holds the RSVs (index of first column is 0)

rank (undef)

column which holds the rank (index of first column is 0)

ignore (undef)

perl regular expression; matching rows are ignored

sorted (undef)

if true it is assumed that the results are sorted according to RSV or rank (highest rated document at the top). Else results are sorted which takes some time in case of huge rankings.

If both rank and rsv are given, the rank parameter is ignored.

$distribution = $result->distribution($judgements)

Get relevance distribution. $judgements must contain the relevance assesments as described in RePrec::Collection(3). The result is a reference to an array containing a two element array reference for each rank (top most rank first). The first element within the references contains the number of relevant documents while the second one contains the number of non-relevant documents.

$rels = $result->rels

returns the number of relevant documents found or undef if the distribution method has not been called before.

nrels

returns the number of non-relevant documents found or undef if the distribution method has not been called before.

BUGS

Yes. Please let me know!

SEE ALSO

perl(1).

AUTHOR

Norbert Gövert <goevert@ls6.cs.uni-dortmund.de>