The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::Translator::Utils - Utilities that requrie a translation table

SYNOPSIS

    use Bio::Translator::Utils;

    # Same constructor as Bio::Translator
    my $utils = new Bio::Translator::Utils();
    my $utils = custom Bio::Translator( \$custom_table );

    my $codons = $utils->codons( $residue );
    my $regex  = $utils->regex( $residue );
    my $indices = $utils->find( $residue );

    my $orf = $utils->getORF( $seq_ref );
    my $cds = $utils->getCDS( $seq_ref );

    my $frames = $utils->nonstop( $seq_ref );

DESCRIPTION

See Bio::Translator for more info. Utils contains utilites that require knowledge of the translation table.

METHODS

codons

    my $codon_array = $translator->codons( $residue);
    my $codon_array = $translator->codons( $residue, \%params );

Returns a list of codons for a particular residue or start codon. In addition to the one-letter codes for amino acids, the following are valid inputs for the residue:

    start:  Start codons (you may also use "+" which is what the translator
            uses as the 1-letter code for start codons)
    stop:   Stop codons (you may also use "*" which is the 1-letter code)
    lower:  Start or stop codons, depending up on strand
    upper:  Start or stop codons, depending up on strand

"lower" and "upper" match the respective ends of a CDS for a given strand (i.e. on the positive strand, lower matches the start, and upper matches them stop). Valid options for the params hash are:

    strand:     1 or -1; default = 1

regex

    my $regex = $translator->regex( $residue );
    my $regex = $translator->regex( $residue, \%params );

Returns a regular expression matching codons for a particular amino acid residue. In addition to the one-letter codes for amino acids, the following are valid inputs for the residue:

    start:  Start codons (you may also use "+" which is what the translator
            uses as the 1-letter code for start codons)
    stop:   Stop codons (you may also use "*" which is the 1 letter code)
    lower:  Start or stop codons, depending up on strand
    upper:  Start or stop codons, depending up on strand

"lower" and "upper" match the respective ends of a CDS for a given strand (i.e. on the positive strand, lower matches the start, and upper matches the stop). Valid options for the params hash are:

    strand: 1 or -1; default = 1

find

    my $locations = $translator->find( $seq_ref, $residue );
    my $locations = $translator->find( $seq_ref, $residue, \%params );

Find the indexes of a given residue in a sequence. In addition to the one-letter codes for amino acids, the following are valid inputs for the residue:

    start:  Start codons (you may also use "+" which is what the translator
            uses as the 1-letter code for start codons)
    stop:   Stop codons (you may also use "*" which is the 1 letter code)
    lower:  Start or stop codons, depending up on strand
    upper:  Start or stop codons, depending up on strand

"lower" and "upper" match the respective ends of a CDS for a given strand (i.e. on the positive strand, lower matches the start, and upper matches the stop). Valid options for the params hash are:

    strand:     1 or -1; default = 1

getORF

    my $orf_arrayref = $translator->getORF( $seq_ref );
    my $orf_arrayref = $translator->getORF( $seq_ref, \%params );

This will get the longest region between stops and return lower and upper bounds, and the strand. Valid options for the params hash are:

    strand:     0, 1 or -1; default = 0 (meaning search both strands)
    lower:      integer between 0 and length; default = 0
    upper:      integer between 0 and length; default = length

Lower and upper are used to specify bounds between which you are searching. Suppose the following was the longest ORF:

 0 1 2 3 4 5 6 7 8 9 10
  T A A A T C T A A G
  *****       *****
        <--------->

This will return:

    [ 3, 9, 1 ]

You can also specify which strand you are looking for the ORF to be on.

For ORFs starting at the very beginning of the strand or trailing off the end, but not in phase with the start or ends, this method will cut at the last complete codon. For example, if the following was the longest ORF:

    0 1 2 3 4 5 6 7 8 9 10
     A C G T A G T T T A
                   *****
       <--------------->

getORF will return:

    [ 1, 10, 1 ]

The distance between lower and upper will always be a multiple of 3. This is to make it clear which frame the ORF is in. The resulting hash may be passed to the translate method.

Example:

    my $orf_ref = $translator->getORF( \'TAGAAATAG' );
    my $orf_ref = $translator->getORF( \$seq, { strand => -1 } );
    my $orf_ref = $translator->getORF(
        \$seq,
        {
            lower => $lower,
            upper => $upper
        }
    );

getCDS

    my $cds_ref = $translator->getCDS( $seq_ref );
    my $cds_ref = $translator->getCDS( $seq_ref, \%params );

Return the strand and boundaries of the longest CDS similar to getORF.

 0 1 2 3 4 5 6 7 8 9 10
  A T G A A A T A A G
  >>>>>       *****
  <--------------->

Will return:

    [ 0, 9, 1 ]

Valid options for the params hash are:

    strand:     0, 1 or -1; default = 0 (meaning search both strands)
    lower:      integer between 0 and length; default = 0
    upper:      integer between 0 and length; default = length
    strict:     0, 1 or 2;  default = 1

Strict controls how strictly getCDS functions. There are 3 levels of strictness, enumerated 0, 1 and 2. 2 is the most strict, and in that mode, a region will only be considered a CDS if both the start and stop is found. In strict level 1, if a start is found, but no stop is present before the end of the sequence, the CDS will run until the end of the sequence. Strict level 0 assumes that start codon is present in each frame just before the start of the molecule. Level 1 is a pretty safe bet, so that is the default.

Example:

    my $cds_ref = $translator->getCDS(\'ATGAAATAG');
    my $cds_ref = $translator->getCDS(\$seq, { strand => -1 } );
    my $cds_ref = $translator->getCDS(\$seq, { strict => 2 } );

nonstop

    my $frames = $translator->nonstop( $seq_ref );
    my $frames = $translator->nonstop( $seq_ref, \%params );

Returns the frames that contain no stop codons for the sequence. Frames are numbered -3, -2, -1, 1, 2 and 3.

     3   ---->
     2  ----->
     1 ------>
       -------
    -1 <------
    -2 <-----
    -3 <----

Valid options for the params hash are:

    strand:     0, 1 or -1; default = 0 (meaning search both strands)

Example:

    my $frames = $translator->nonstop(\'TACGTTGGTTAAGTT'); # [ 2, 3, -1, -3 ]
    my $frames = $translator->nonstop(\$seq, { strand => 1 }  ); # [ 2, 3 ]
    my $frames = $translator->nonstop(\$seq, { strand => -1 } ); # [ -1, -3 ]

AUTHOR

Kevin Galinsky, <kgalinsky plus cpan at gmail dot com>