Finnigan - Thermo/Finnigan mass spec data decoder
use Finnigan; seek INPUT, $object_address, 0 my $o = Finnigan::Object->decode(\*STREAM, $arg); $o->dump;
where 'Object' is a symbol for any of the specific decoder objects (Finnigan::*) and STREAM is an open filehandle positioned at the start of the structure to be decoded. Some decoders may require an additional argument (file format version).
Finnigan::*
STREAM
Finnigan is a non-functional package whose only purpose is to pull in all other packages in the module into its namespace. It does no work; all work is done in the sub-modules. Each submodule has its own documentation; please see the "SUBMODULES" section below or visit the project's home page for a more detailed descripion of the file format, data structures, decoders and tools.
Finnigan
Each decoder submodule has a simple command-line interface. See the "TOOLS" section for a list of command-line tools that can be used to examine the Finnigan file structures and dump their contents with absolute or relative addresses. One of the tools, uf-mzxml, can be used to convert the entire data stream in a Finnigan file to the mzXML format.
The only method defined in the top-level Finnigan package is list_modules, which can be used to ascertain that all packages have been successfully loaded:
list_modules
perl -MFinnigan -e 'Finnigan::list_modules'
To simplify the decoder and allow it to accommodate a variety of file versions, it has been subdivided into a set of submodules, each representing a structural unit of the Finnigan file format. The partitioning of the format into units is somewhat arbitrary; it was done based on the comparative analysis of the structure of several different formats. The structures common to all formats are viewed as "basic" and merit a dedicated decoder; the same goes for the highly repetitive structures, such as Finnigan::ScanIndexEntry. Some structures remain roughly similar, but keep acquiring new elements with every new file version; the decoders for these structures are parameterised with the version number (for example, Finnigan::ScanEventPreamble).
The notion of a preamble (the term I made up, not knwoing better) represents what seems to be a persistent idiom in Thermo structure coding: collect the binary data in a fixed-size block followed by variable-length objects (mostly text strings). The earlier Finnigan formats contained little or no text and virtually no variable-length data, so what I call a preamble today used to be the whole deal in the past, and it makes sense to have a separate decoder for each such rudimentary container. Keeping these decoders separate makes it possible to go back and decode the historical data simply by recombining the existing decoders.
Each Finnigan::* object has a constructor method named decode(), whose first argument is a filehandle positioned at the start of the object to be decoded. Some decoders require additional arguments, such as the file version number. A single argument is passed as it is, while multiple arguments can be passed as an array reference.
decode()
The constructor advances the handle to the start of the next object, so seeking to the start of the object of interest is only necessary when doing partial reads; in principle, the entire file can be read by calling of object constructors in sequency. In reality, it is often more efficient to seek ahead to fetch an index structure stored near the end of the file, then go back to the data stream using the pointers in the index.
The decoded data can be obtained by calling accessor methods on the object or by de-referencing the object reference (since all Finnigan objects are blessed hash references):
$x = $object->element
or
$x = $object->{element}
The accessor option is nicer, as it leads to less clutter in the code and leaves the possibility for additional processing of the data by the accessor routine, but it incurs a substantial performance penalty. For this reason, hash dereference is preferred in performance-critical code (inside loops).
This is an "instance" method; it must be defined in each non-trivial decoder object.
All Finnigan objects are the descendants of Finnigan::Decoder. One of the methods they inherit is dump, which provides an easy way to explore the contents of decoded objects. The dump method prints out the structure of the object it is called on in a few styles, with relative or absolute addressess.
dump
For example, many object dumps used in this wiki were created thus:
$object->dump(style => 'wiki', relative => 1);
The style argument can have the values of wiki, html or no value at all (meaning plain text). The relative argument is a boolean indicating whether to use the absolute or relative file addresses in the output. In this case, "relative" means "an offset within the object", while "absolute" is the seek address within the data file.
style
wiki
html
relative
This is the Finnigan::Decoder constructor method. Some derived decoders use it internally, but it can also be used to decode trivial objects at a given location in a file without having to write a dedicated decoder.
Finnigan::Decoder
For example, to read a 32-bit stream length, use:
my $object = Finnigan::Decoder->read(\*INPUT, ['length' => ['V', 'UInt32']]);
The $template_list argument names all fields to decode (in this case, just one: length), the template to use for each field (in this example, V), and provides a human-readable symbol for the template, which can be used in a number of ways; for example, when inspecting the structures with the dump method.
$template_list
length
V
This may seem like a kludgy way of reading four bytes, but the upshot is that the resulting $object will have the size, type and location information tucked into it, so it can be analysed and dumped in a way consistent with other decoded objects. The advantage becomes even more apparent when the structure is more complex than a single scalar object.
$object
The inherited read method provides the core functionality in all Finnigan decoders.
read
If only the value of the object is sought, then this even more kludgy code can be used:
my $stream_length = Finnigan::Decoder->read(\*INPUT, ['length' => ['V', 'UInt32']])->{data}->{length}->{value};
Doing it this way is nonetheless easier than writing several lines of code to read the data into a buffer, check for the I/O errors and unpack the value.
A convenience method defined in some of the Finnigan objects. It allows a concise representation of an object to be injected anywhere Perl expects a string. For example,
$scan_event = Finnigan::ScanEvent->decode( \*INPUT, $header->version); say "$scan_event";
RawFileInfo
ScanDataPacket
ScanEvent
The Unfinnigan tools extract data from the Finnigan files of several known versions. They are listed roughly in the order in which the structures they decode occur in the data file.
read the FileHeader structure
FileHeader
read the SeqRow structure (Sequence Table Row)
SeqRow
read the CASInfo structure (autosampler info)
CASInfo
read RawFileInfo, the primary index structure
unravel the embedded MethodFile container
MethodFile
examine the scan profile and peak data in a single MS scan (ScanDataPacket)
read RunHeader), the secondary index structure
RunHeader
read the instrument IDs (the InstID structure)
InstID
list or dump the instrument log stream (InstrumentLogRecord structures)
InstrumentLogRecord
list the error log (a steam of Error structures)
Error
dump the ScanEventTemplate structures in the order of segment hierarchy
ScanEventTemplate
print or dump the ScanParameters stream
ScanParameters
print or dump the TuneFile structure
TuneFile
read the stream of ScanIndexEntry records (scan data pointers)
ScanIndexEntry
read the stream of ScanEvent records
The following are the conversion tools, transcoding the entire raw files into alternative representations.
convert a raw file to mzXML
unpack the base64-encoded scan data in an mzXML file
All tools contain their own POD sections. To read the documentation for a tool, use
man <tool> perldoc <tool>
Gene Selkov, <selkovjr@gmail.com>
Copyright (C) 2010 by Gene Selkov
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.
To install Finnigan, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Finnigan
CPAN shell
perl -MCPAN -e shell install Finnigan
For more information on module installation, please visit the detailed CPAN module installation guide.