Bio::GFF3::LowLevel::Parser - a fast, low-level gff3 parser
my $p = Bio::GFF3::LowLevel::Parser->open( $file_or_fh ); while( my $i = $p->next_item ) { if( ref $i eq 'ARRAY' ) { ## $i is an arrayref of feature lines that have the same ID, ## in the same format as returned by ## Bio::GFF3::LowLevel::gff3_parse_feature for my $f (@$i) { # for each location of this feature # do something with it } } elsif( $i->{directive} ) { if( $i->{directive} eq 'FASTA' ) { my $fasta_filehandle = $i->{filehandle}; ## parse the FASTA in the filehandle with BioPerl or ## however you want. or ignore it. } elsif( $i->{directive} eq 'gff-version' ) { print "it says it is GFF version $i->{value}\n"; } elsif( $i->{directive} eq 'sequence-region' ) { print( "found a sequence-region, sequence $i->{seq_id},", " from $i->{start} to $i->{end}\n" ); } } elsif( $i->{comment} ) { ## this is a comment in your GFF3 file, in case you want to do ## something with it. print "that comment said: '$i->{comment}'\n"; } else { die 'this should never happen!'; } }
This is a fast, low-level parser for Generic Feature Format, version 3 (GFF3). It is a low-level parser, it only returns dumb hashrefs. It does reconstruct feature hierarchies, however, using features' ID, Parent, and Derives_from attributes, and it does group together lines with the same ID (i.e. features that have multiple locations).
ID
Parent
Derives_from
Features are returned as arrayrefs containing one or more (never zero) feature lines parsed in the same format as "gff3_parse_feature" in Bio::GFF3::LowLevel. Each has some additional keys for related features: child_features and derived_features, each of which is a (possibly empty) arrayref of features (i.e. arrayrefs) that refer to this one as a Parent or claim that they Derives_from it.
child_features
derived_features
Note that, to make code that uses this parser easier to write, all features have child_features and derived_features arrayrefs. This means you don't have to check for the existence of these before seeing if they have anything in them.
Directives are returned as hashrefs, in the same format as "gff3_parse_directive" in Bio::GFF3::LowLevel.
Comments are parsed into a hashref of the form:
{ comment => 'text of the comment, not including the hash mark(s) and ending newline' }
Make a new parser object that will parse the GFF3 from all of the files or filehandles that you give it, as if they were all a single stream.
Set a maximum number of features the parser will keep buffered in case there are features later in the file referring to it. By default, there is no limit, with the parser instead relying on the presence of '###' marks in the GFF3 file.
Returns a wrapped copy of this parser that returns data that is backward-compatible with what the 1.0 version of this parser returned. Do not use in new code.
Iterate through all of the items (features, directives, and comments) in the file(s) given to the parser. Features are arrayrefs of hashrefs, and directives and comments are hashrefs.
Robert Buels <rmb32@cornell.edu>
This software is copyright (c) 2012 by Robert Buels.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Bio::GFF3::LowLevel, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::GFF3::LowLevel
CPAN shell
perl -MCPAN -e shell install Bio::GFF3::LowLevel
For more information on module installation, please visit the detailed CPAN module installation guide.