The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::Util::DNA - Basic DNA utilities

SYNOPSES

    use Bio::Util::DNA qw(:all);

    my $clean_ref = cleanDNA($seq_ref);
    my $seq_ref = randomDNA(100);
    my $rev_ref = reverse_complement($seq_ref);

DESCRIPTION

Provides a set of functions and predefined variables which are handy when working with DNA.

VARIABLES

BASIC VARIABLES

Basic nucleotide variables that could be useful. All of the variables have a prefix and a suffix;

Prefixes

DNA [ACGT]
RNA [ACGU]
degenerate
all_nucleotide

Suffixes

${prefix}s

String of the different nucleotides

@{prefix}s

Array of the different nucleotides

${prefix}_match

Precompiled regular expression which matches nucleotide characters

${prefix}_fail

Precompiled regular expression which matches non-nucleotide characters

%degenerate2nucleotides

Hash of degenerate nucleotide definitions. Each entry contains a reference to an array of DNA nucleotides that each degenerate nucleotide stands for.

%nucleotides2degenerate

Reverse of %degenerate2nucleotides. Keys are alphabetically-sorted DNA nucleotides and values are the degenerate nucleotide that can represent those nucleotides.

%degenerate_hierarchy

Contains the heirarchy of degenerate nucleotides; N of course contains all the other degenerates, and the four degenerates that can stand for three different bases contain three of the two-base degenerates.

FUNCTIONS

cleanDNA

    my $clean_ref = cleanDNA($seq_ref);

Cleans the sequence for use. Strips out comments (lines starting with '>') and whitespace, converts uracil to thymine, and capitalizes all characters.

Examples:

    my $clean_ref = cleanDNA($seq_ref);

    my $seq_ref = cleanDNA(\'actg');
    my $seq_ref = cleanDNA(\'act tag cta');
    my $seq_ref = cleanDNA(\'>some mRNA
                             acugauauagau
                             uauagacgaucc');

randomDNA

    my $seq_ref = randomDNA($length);

Generate random DNA for testing this module or your own scripts. Default length is 100 nucleotides.

Example:

    my $seq_ref = randomDNA();
    my $seq_ref = randomDNA(600);

reverse_complement

rev_comp

    my $reverse_ref = reverse_complement($seq_ref);

Finds the reverse complement of the sequence and handles degenerate nucleotides.

Example:

    $reverse_ref = reverse_complement(\'act');

unrollDNA

    my $seq_arrayref = unrollDNA( $seq_ref );

Unroll a DNA string containing degenerate nucleotides. The first entry of the arrayref will be the actual sequence.

Example:

    my $seq_arrayref = unrollDNA( \'ACSTAD' ) =
        [
            'ACSTAD', 'ACCTAD', 'ACGTAD',
            'ACSTAR', 'ACCTAR', 'ACGTAR',
            'ACSTAW', 'ACCTAW', 'ACGTAW',
            'ACSTAK', 'ACCTAK', 'ACGTAK',
            'ACSTAA', 'ACCTAA', 'ACGTAA',
            'ACSTAG', 'ACCTAG', 'ACGTAG',
            'ACSTAT', 'ACCTAT', 'ACGTAT'
        ]; 

AUTHOR

Kevin Galinsky, <first initial last name plus cpan at gmail dot com>

COPYRIGHT AND LICENSE

Copyright (c) 2010-2011, Broad Institute.

Copyright (c) 2008-2009, J. Craig Venter Institute.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.