The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Finnigan - Thermo/Finnigan mass spec data decoder

SYNOPSIS

  use Finnigan;

  seek INPUT, $object_address, 0
  my $o = Finnigan::Object->decode(\*STREAM, $arg);
  $o->dump;

where 'Object' is a symbol for any of the specific decoder objects (Finnigan::*) and STREAM is an open filehandle positioned at the start of the structure to be decoded. Some decoders may require an additional argument (file format version).

DESCRIPTION

Finnigan is a non-functional package whose only purpose is to pull in all other packages in the module into its namespace. It does no work; all work is done in the sub-modules. Each submodule has its own documentation; please see the "SUBMODULES" section below or visit the project's home page for a more detailed descripion of the file format, data structures, decoders and tools.

Each decoder submodule has a simple command-line interface. See the "TOOLS" section for a list of command-line tools that can be used to examine the Finnigan file structures and dump their contents with absolute or relative addresses. One of the tools, uf-mzxml, can be used to convert the entire data stream in a Finnigan file to the mzXML format.

METHODS

list_modules

The only method defined in the top-level Finnigan package is list_modules, which can be used to ascertain that all packages have been successfully loaded:

  perl -MFinnigan -e 'Finnigan::list_modules'

SUBMODULES

To simplify the decoder and allow it to accommodate a variety of file versions, it has been subdivided into a set of submodules, each representing a structural unit of the Finnigan file format. The partitioning of the format into units is somewhat arbitrary; it was done based on the comparative analysis of the structure of several different formats. The structures common to all formats are viewed as "basic" and merit a dedicated decoder; the same goes for the highly repetitive structures, such as Finnigan::ScanIndexEntry. Some structures remain roughly similar, but keep acquiring new elements with every new file version; the decoders for these structures are parameterised with the version number (for example, Finnigan::ScanEventPreamble).

The notion of a preamble (the term I made up, not knwoing better) represents what seems to be a persistent idiom in Thermo structure coding: collect the binary data in a fixed-size block followed by variable-length objects (mostly text strings). The earlier Finnigan formats contained little or no text and virtually no variable-length data, so what I call a preamble today used to be the whole deal in the past, and it makes sense to have a separate decoder for each such rudimentary container. Keeping these decoders separate makes it possible to go back and decode the historical data simply by recombining the existing decoders.

Common submodule methods

decode($stream, $arg)

Each Finnigan::* object has a constructor method named decode(), whose first argument is a filehandle positioned at the start of the object to be decoded. Some decoders require additional arguments, such as the file version number. A single argument is passed as it is, while multiple arguments can be passed as an array reference.

The constructor advances the handle to the start of the next object, so seeking to the start of the object of interest is only necessary when doing partial reads; in principle, the entire file can be read by calling of object constructors in sequency. In reality, it is often more efficient to seek ahead to fetch an index structure stored near the end of the file, then go back to the data stream using the pointers in the index.

The decoded data can be obtained by calling accessor methods on the object or by de-referencing the object reference (since all Finnigan objects are blessed hash references):

  $x = $object->element

or

  $x = $object->{element}

The accessor option is nicer, as it leads to less clutter in the code and leaves the possibility for additional processing of the data by the accessor routine, but it incurs a substantial performance penalty. For this reason, hash dereference is preferred in performance-critical code (inside loops).

This is an "instance" method; it must be defined in each non-trivial decoder object.

dump(%args)

All Finnigan objects are the descendants of Finnigan::Decoder. One of the methods they inherit is dump, which provides an easy way to explore the contents of decoded objects. The dump method prints out the structure of the object it is called on in a few styles, with relative or absolute addressess.

For example, many object dumps used in this wiki were created thus:

  $object->dump(style => 'wiki', relative => 1);

The style argument can have the values of wiki, html or no value at all (meaning plain text). The relative argument is a boolean indicating whether to use the absolute or relative file addresses in the output. In this case, "relative" means "an offset within the object", while "absolute" is the seek address within the data file.

read($stream, $template_list, $arg)

This is the Finnigan::Decoder constructor method. Some derived decoders use it internally, but it can also be used to decode trivial objects at a given location in a file without having to write a dedicated decoder.

For example, to read a 32-bit stream length, use:

  my $object = Finnigan::Decoder->read(\*INPUT, ['length' => ['V', 'UInt32']]);

The $template_list argument names all fields to decode (in this case, just one: length), the template to use for each field (in this example, V), and provides a human-readable symbol for the template, which can be used in a number of ways; for example, when inspecting the structures with the dump method.

This may seem like a kludgy way of reading four bytes, but the upshot is that the resulting $object will have the size, type and location information tucked into it, so it can be analysed and dumped in a way consistent with other decoded objects. The advantage becomes even more apparent when the structure is more complex than a single scalar object.

The inherited read method provides the core functionality in all Finnigan decoders.

If only the value of the object is sought, then this even more kludgy code can be used:

  my $stream_length = Finnigan::Decoder->read(\*INPUT, ['length' => ['V', 'UInt32']])->{data}->{length}->{value};

Doing it this way is nonetheless easier than writing several lines of code to read the data into a buffer, check for the I/O errors and unpack the value.

stringify

A convenience method defined in some of the Finnigan objects. It allows a concise representation of an object to be injected anywhere Perl expects a string. For example,

  $scan_event = Finnigan::ScanEvent->decode( \*INPUT, $header->version);
  say "$scan_event";

Submodule index

Finnigan::AuditTag (sample audit tag)
Finnigan::CASInfo (autosampler info)
Finnigan::CASInfoPreamble (numerical autosampler parameters)
Finnigan::Decoder (the base class for all Finnigan decoders)
Finnigan::Error (error log entry)
Finnigan::FileHeader
Finnigan::FractionCollector (M/z range decoder)
Finnigan::GenericDataDescriptor (a self-decoding structure element)
Finnigan::GenericDataHeader (self-decoding structure header)
Finnigan::GenericRecord (self-decoding structure)
Finnigan::InjectionData (sample injection parameters)
Finnigan::InstID (instrument identifiers)
Finnigan::InstrumentLogRecord (instrument log entry)
Finnigan::MethodFile (an OLE2 container for instrument method files)
Finnigan::OLE2DIF (Double-Indirect FAT decoder)
Finnigan::OLE2DirectoryEntry
Finnigan::OLE2FAT (FAT sector decoder)
Finnigan::OLE2File (Microsoft OLE2/CDF file decoder)
Finnigan::OLE2Header (OLE2 header decoder)
Finnigan::OLE2Property (OLE2 index node decoder)
Finnigan::PacketHeader (scan data header)
Finnigan::Peak (an element of the peak centroid list)
Finnigan::Peaks (the peak centroid list)
Finnigan::Profile (scan profile)
Finnigan::ProfileChunk (a single chunk of a filetered profile)
Finnigan::RawFileInfo (primary index structure)
Finnigan::RawFileInfoPreamble (the binary data part of RawFileInfo)
Finnigan::Reaction (precursor ion data)
Finnigan::RunHeader (secondary index structure)
Finnigan::SampleInfo (secondary index structure)
Finnigan::Scan (a lightweight ScanDataPacket decoder)
Finnigan::ScanEvent (scan type descriptor)
Finnigan::ScanEventPreamble (the byte array component of ScanEvent)
Finnigan::ScanEventTemplate (the prototype scan descriptor)
Finnigan::ScanIndexEntry (scan data pointer)
Finnigan::ScanParameters (scan meta-data)
Finnigan::SeqRow (sequencer table row)

TOOLS

The Unfinnigan tools extract data from the Finnigan files of several known versions. They are listed roughly in the order in which the structures they decode occur in the data file.

Query tools

uf-header

read the FileHeader structure

uf-seqrow

read the SeqRow structure (Sequence Table Row)

uf-casinfo

read the CASInfo structure (autosampler info)

uf-rfi

read RawFileInfo, the primary index structure

uf-meth

unravel the embedded MethodFile container

uf-scan

examine the scan profile and peak data in a single MS scan (ScanDataPacket)

uf-runheader

read RunHeader), the secondary index structure

uf-instrument

read the instrument IDs (the InstID structure)

uf-log

list or dump the instrument log stream (InstrumentLogRecord structures)

uf-error

list the error log (a steam of Error structures)

uf-segments

dump the ScanEventTemplate structures in the order of segment hierarchy

uf-params

print or dump the ScanParameters stream

uf-tune

print or dump the TuneFile structure

uf-index

read the stream of ScanIndexEntry records (scan data pointers)

uf-trailer

read the stream of ScanEvent records

Conversion tools

The following are the conversion tools, transcoding the entire raw files into alternative representations.

uf-mzxml

convert a raw file to mzXML

mzxml-unpack

unpack the base64-encoded scan data in an mzXML file

All tools contain their own POD sections. To read the documentation for a tool, use

  man <tool>
  perldoc <tool>

AUTHOR

Gene Selkov, <selkovjr@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2010 by Gene Selkov

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.