The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Net::Z3950::IndexMARC - Comprehensive but inefficent index for MARC records

SYNOPSIS

 $file = MARC::File::USMARC->in($filename);
 $index = new Net::Z3950::IndexMARC();
 while ($marc = $file->next()) {
     $index->add($marc);
 }
 $index->dump(\*STDOUT);
 $hashref = $index->find('@attr 1=4 dinosaur');
 foreach $i (keys %$hashref) {
    $rec = $index->fetch($i);
    print $rec->as_formatted();
 }

DESCRIPTION

This module provides a comprehensive inverted index across a set of MARC records, allowing simple keyword retrieval down to the level of individual field and subfields. However, it does this by building a big Perl data-structure (hash of hashes of arrays) in memory, and makes no efforts whatsoever towards optimisation. So this is only appropriate for small collections of records.

METHODS

new()

 $index = new Net::Z3950::IndexMARC();

Creates a new IndexMARC object. Takes no parameters, and returns the new object.

add()

 $record = new MARC::Record();
 $record->append_fields(...);
 $index->add($record);

Adds a single MARC record to the specified index. A reference to the record itself is also added, so the record object will not be garbage collected until (at least) the index goes out of scope. The record passed in must be of the type MARC::Record.

An opaque token representing the new record is returned. This may subsequently be passed to fetch() to retrieve the record.

dump()

 $index->dump(\*STDOUT);

Dumps the contents of the specified index to the specified stream in human-readable form. Takes no arguments. Should only be used for debugging.

find()

 $hithash = $index->find("@and fruit fish");

Finds records satisfying the specified PQF query, and returns a reference to a hash consisting of one element for each matching record.

Each key in the returned hash is an opaque token representing a record, which may be fed to fetch() to retrieve the record itself. The corresponding value contains details of the hits in that record. The hit details consist of an array of arbitrary length, one element per occurrence of the searched-for term. Each element of this array is itself an array of three elements: the tag of the field in which the term exists [0], the tag of the subfield [2], and the word-number within the field, starting from word 1 [3].

PQF is Prefix Query Format, as described in the ``Tools'' section of the YAZ manual; however, this module does not perform field-specific searching since to do so would necessarily involve a mapping between Type-1 query access points and MARC fields, which we want to avoid having to assume anything about. Accordingly, use attributes are ignored. Further, at present boolean operations are also refused, and only the single-term queries are supported.

fetch()

 $marc = $index->fetch($token);

Returns the MARC::Record object corresponding to the specified record token, as returned from add() or find().

PROVENANCE

This module is part of the Net::Z3950::RadioMARC distribution. The copyright, authorship and licence are all as for the distribution.