The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::GeneDesign

VERSION

Version 5.52

DESCRIPTION

AUTHOR

Sarah Richardson <smrichardson@lbl.gov>

CONSTRUCTORS

new

Returns an initialized Bio::GeneDesign object.

This function reads the ConfigData written at installation, imports the relevant sublibraries, and sets the relevant paths.

    my $GD = Bio::GeneDesign->new();
 

ACCESSORS

EMBOSS

returns a value if EMBOSS_support was vetted and approved during installation.

BLAST

returns a value if BLAST_support was vetted and approved during installation.

graph

returns a value if graphing_support was vetted and approved during installation.

vmatch

returns a value if vmatch_support was vetted and approved during installation.

enzyme_set

Returns a hash reference where the keys are enzyme names and the values are RestrictionEnzyme objects, if the enzyme set has been defined.

To set this value, use set_restriction_enzymes.

enzyme_set_name

Returns the name of the enzyme set in use, if there is one.

To set this value, use set_restriction_enzymes.

all_enzymes

Returns a hash reference where the keys are enzyme names and the values are RestrictionEnzyme objects

To set this value, use set_restriction_enzymes.

organism

Returns the name of the organism in use, if there is one.

To set this value, use set_organism.

codontable

Returns the codon table in use, if there is one.

The codon table is a hash reference where the keys are upper case nucleotides and the values are upper case single letter amino acids.

    my $codon_t = $GD->codontable();
    $codon_t->{"ATG"} eq "M" || die;

To set this value, use set_codontable.

reversecodontable

Returns the reverse codon table in use, if there is one.

The reverse codon table is a hash reference where the keys are upper case single letter amino acids and the values are upper case nucleotides.

    my $revcodon_t = $GD->reversecodontable();
    $revcodon_t->{"M"} eq "ATG" || die;
    

This value is set automatically when set_codontable is run.

rscutable

Returns the RSCU table in use, if there is one.

The RSCU codon table is a hash reference where the keys are upper case nucleotides and the values are floats.

    my $rscu_t = $GD->rscutable();
    $rscu_t->{"ATG"} eq 1.00 || die;

To set this value, use set_rscu_table.

FUNCTIONS

melt

    my $Tm = $GD->melt(-sequence => $myseq);

The -sequence argument is required.

Returns the melting temperature of a DNA sequence.

You can set the salt and DNA concentrations with the -salt and -concentration arguments; they are 50mm (.05) and 100 pm (.0000001) respectively.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be analyzed with the -sequence flag.

There are four different formulae to choose from. If you wish to use the nearest neighbor method, use the -nearest_neighbor flag. Otherwise the appropriate formula will be determined by the length of your -sequence argument.

For sequences under 14 base pairs: Tm = (4 * #GC) + (2 * #AT).

For sequences between 14 and 50 base pairs: Tm = 100.5 + (41 * #GC / length) - (820 / length) + 16.6 * log10(salt)

For sequences over 50 base pairs: Tm = 81.5 + (41 * #GC / length) - (500 / length) + 16.6 * log10(salt) - .62;

complement

    $my_seq = "AATTCG";
    
    my $complemented_seq = $GD->complement($my_seq);
    $complemented_seq eq "TTAAGC" || die;
    
    my $reverse_complemented_seq = $GD->complement($my_seq, 1);
    $reverse_complemented_seq eq "CGAATT" || die;
    
    #clean
    my $complemented_seq = $GD->complement(-sequence => $my_seq);
    $complemented_seq eq "TTAAGC" || die;
    
    my $reverse_complemented_seq = $GD->complement(-sequence => $my_seq,
                                                   -reverse => 1);
    $reverse_complemented_seq eq "CGAATT" || die;
    

The -sequence argument is required.

Complements or reverse complements a DNA sequence.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.

If you also pass along a true statement, the sequence will be reversed and complemented.

count

    $my_seq = "AATTCG";
    my $count = $GD->count($my_seq);
    $count->{C} == 1 || die;
    $count->{G} == 1 || die;
    $count->{A} == 2 || die;
    $count->{GCp} == 33.3 || die;
    $count->{ATp} == 66.7 || die;
    
    #clean
    my $count = $GD->count(-sequence => $my_seq);

You must pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object.

the count function counts the bases in a DNA sequence and returns a hash reference where each base (including the ambiguous bases) are keys and the values are the number of times they appear in the sequence. There are also the special values GCp and ATp for GC and AT percentage.

regex_nt

    my $my_seq = "ABC";
    my $regex = $GD->regex_nt(-sequence => $my_seq);
    # $regex is qr/A[CGT]C/;
    
    my $regarr = $GD->regex_nt(-sequence => $my_seq --reverse_complement => 1);
    # $regarr is [qr/A[CGT]C/, qr/G[ACG]T/]
  

You must pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed with the -sequence flag.

regex_nt creates a compiled regular expression or a set of them that can be used to query large nucleotide sequences for possibly ambiguous subsequences.

If you want to get regular expressions for both the forward and reverse senses of the DNA, use the -reverse_complement flag and expect a reference to an array of compiled regexes.

regex_aa

    my $my_pep = "AEQ*";
    my $regex = $GD->regex_aa(-sequence => $my_pep);
    $regex == qr/AEQ[\*]/ || die;
  

Creates a compiled regular expression or a set of them that can be used to query large amino acid sequences for smaller subsequences.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed with the -sequence flag.

sequence_is_ambiguous

    my $my_seq = "ABC";
    my $flag = $GD->sequence_is_ambiguous($my_seq);
    $flag == 1 || die;
    
    $my_seq = "ATC";
    $flag = $GD->sequence_is_ambiguous($my_seq);
    $flag == 0 || die;
  

Checks to see if a DNA sequence contains ambiguous bases (RYMKWSBDHVN) and returns true if it does.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.

ambiguous_translation

    my $my_seq = "ABC";
    my @peps = $GD->ambiguous_translation(-sequence => $my_seq, -frame => 1);
    # @peps is qw(I T C)
    

You must pass a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.

Translates a nucleotide sequence that may have ambiguous bases and returns an array of possible peptides.

The frame argument may be 1, 2, 3, -1, -2, or -3. It may also be t (three, 1, 2, 3), or s (six, 1, 2, 3, -1, -2, -3). It defaults to 1.

ambiguous_transcription

    my $my_seq = "ABC";
    my $seqs = $GD->ambiguous_transcription($my_seq);
    # $seqs is [qw(ACC AGC ATC)]
    

Deambiguates a nucleotide sequence that may have ambiguous bases and returns a reference to a sorted array of possible unambiguous sequences.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.

positions

    my $seq = "TGCTGACTGCAGTCAGTACACTACGTACGTGCATGAC";
    my $seek = "CWC";
    
    my $positions = $GD->positions(-sequence => $seq,
                                   -query => $seek);
    # $positions is {18 => "CAC"}

    $positions = $GD->positions(-sequence => $seq,
                                -query => $seek,
                                -reverse_complement => 1);
    # $positions is {18 => "CAC", 28 => "GTG"}
    

Finds and returns all the positions and sequences of a potentially ambiguous subsequence in a larger sequence. The reverse_complement flag is off by default.

You can pass either string variables, Bio::Seq objects, or Bio::SeqFeatureI objects as the sequence and query arguments; additionally you may pass a RestrictionEnzyme object as the query argument.

set_codontable

    # load a codon table from the GeneDesign configuration directory
    $GD->set_codontable(-organism_name => "yeast");
    
    # load a codon table from an arbitrary path and catch it in a variable
    my $codon_t = $GD->set_codontable(-organism_name => "custom",
                                      -table_path => "/path/to/table.ct");
    

The -organism_name argument is required.

This function loads, sets, and returns a codon definition table. After it is run the accessor codontable will return the hash reference that represents the codon table.

If no path is provided, the configuration directory /codon_tables is checked for tables that match the provided organism name. If there is no table in that directory, a warning will appear and the standard codon table will be used.

Any codon table that is missing a definition for a codon will cause a warning to be issued. The table format for codon tables is

    # Standard genetic code
    {TTT} = F
    {TTC} = F
    {TTA} = L
    ...

See NCBI's table

set_rscutable

    # load a RSCU table from the GeneDesign configuration directory
    $GD->set_rscutable(-organism_name => "yeast");
    
    # load an RSCU table from an arbitrary path and catch it in a variable
    my $rscu_t = $GD->set_rscutable(-organism_name => "custom",
                                    -table_path => "/path/to/table.rscu");
    

The -organism_name argument is required.

This function loads, sets, and returns an RSCU table. After it is run the accessor rscutable will return the hash reference that represents the RSCU table.

If no path is provided, the configuration directory /codon_tables is checked for tables that match the provided organism name. If there is no table in that directory, a warning will appear and the flat RSCU table will be used.

Any RSCU table that is missing a definition for a codon will cause a warning to be issued. The table format for RSCU tables is

    # Saccharomyces cerevisiae (Highly expressed genes)
    # Nucleic Acids Res 16, 8207-8211 (1988)
    {TTT} = 0.19
    {TTC} = 1.81
    {TTA} = 0.49
    ...

See Sharp et al. 1986.

set_organism

    # load both codon tables and RSCU tables simultaneously
    $GD->set_organism(-organism_name => "yeast");
    
    # with arguments
    $GD->set_organism(-organism_name => "custom",
                      -table_path => "/path/to/table.ct",
                      -rscu_path => "/path/to/table.rscu");
    

The -organism_name argument is required.

This function is just a shortcut; it runs "set_codontable" in set_codontable and "set_rscutable" in set_rscutable. See those functions for details.

codon_count

    # count the codons in a list of sequences
    my $tally = $GD->codon_count(-input => \@sequences);
    
    # add a gene to an existing codon count
    $tally = $GD->codon_count(-input => $sequence,
                              -count => $tally);
                              
    # add a list of Bio::Seq objects to an existing codon count
    $tally = $GD->codon_count(-input => \@seqobjects,
                              -count => $tally);
    

The -input argument is required and will take a string variable, a Bio::Seq object, a Bio::SeqFeatureI object, or a reference to an array full of any combination of those things.

The codon_count function takes a set of sequences and counts how often each codon appears in them. It returns a hash reference where the keys are upper case nucleotide codons and the values are integers. If you pass a hash reference containing codon counts with the -count argument, new counts will be added to the old values.

This function will warn you if non nucleotide codons are found.

TODO: what about ambiguous codons?

generate_RSCU_table

    my $rscu_t = $GD->generate_RSCU_table(-sequences => \@list_of_sequences);
    

The -sequences argument is required and will take a string variable, a Bio::Seq object, a Bio::SeqFeatureI object, or a reference to an array full of any combination of those things.

The generate_RSCU_table function takes a set of sequences, counts how often each codon appears, and returns an RSCU table as a hash reference where the keys are upper case nucleotide codons and the values are floats.

See Sharp et al. 1986.

generate_codon_report

  my $report = $GD->generate_codon_report(-sequences => \@list_of_sequences);

The report will have the format

  TTT (F) 12800 0.74
  TTC (F) 21837 1.26
  TTA (L)  4859 0.31
  TTG (L) 18806 1.22

where the first column in each group is the codon, the second column is the one letter amino acid abbreviation in parentheses, the third column is the number of times that codon has been seen, and the fourth column is the RSCU value for that codon.

This report comes in a 4x4 layout, as would a standard genetic code table in a textbook.

NO TEST

generate_RSCU_file

  my $contents = $GD->generate_RSCU_file(
    -sequences => \@seqs,
    -comments => ["Got these codons from mice"]
  );
  open (my $OUT, '>', '/path/to/cods') || die "can't write to /path/to/cods";
  print $OUT $contents;
  close $OUT;

This function generates a string that can be written to file to serve as a GeneDesign RSCU table. Provide a set of sequences and an optional array reference of comments to prepend to the file.

The file will have the format # Comment 1 # ... # Comment n {TTT} = 0.19 {TTC} = 1.81 ...

NO TEST

list_enzyme_sets

  my @available_enzlists = $GD->list_enzyme_sets();
  # @available_enzlists == ('standard_and_IIB', 'blunts', 'IIB', 'nonpal', ...)

Returns an array containing the names of every restriction enzyme recognition list GeneDesign knows about.

set_restriction_enzymes

  $GD->set_restriction_enzymes(-enzyme_set => 'blunts');

or

  $GD->set_restriction_enzymes(-list_path => '/path/to/enzyme_file');

or even

  $GD->set_restriction_enzymes(
    -list_path => '/path/to/enzyme_file',
    -enzyme_set => 'custom_enzymes'
  );

All will return a hash structure full of restriction enzymes.

Tell GeneDesign which set of restriction enzymes to use. If you provide only a set name with the -enzyme_set flag, GeneDesign will check its config path for a matching file. Otherwise you must provide a path to a file (and optionally a name for the set).

remove_from_enzyme_set

Removes a subset of enzymes from an enzyme list. This only happens in memory, no files will be altered. The argument is an array reference of enzyme names.

  $GD->set_restriction_enzymes(-enzyme_set => 'blunts');
  $GD->remove_from_enzyme_set(-enzymes => ['SmaI', 'MlyI']);

NO TEST

add_to_enzyme_set

Adds a subset of enzymes to an enzyme list. This only happens in memory, no files will be altered. The argument is an array reference of RestrictionEnzyme objects.

  #Grab all known enzymes
  my $allenz = $GD->set_restriction_enzymes(-enzyme_set => 'standard_and_IIB');

  #Pull out a few
  my @keepers = ($allenz->{'BmrI'}, $allenz->{'HphI'});

  #Give GeneDesign the enzyme set you want
  $GD->set_restriction_enzymes(-enzyme_set => 'blunts');

  #Add the few enzymes it didn't have before
  $GD->add_to_enzyme_set(-enzymes => \@keepers);

NO TEST

restriction_status

build_prefix_tree

Take an array reference of nucleotide sequences (they can be strings, Bio::Seq objects, or Bio::GeneDesign::RestrictionEnzyme objects) and create a suffix tree. If you add the peptide flag, the sequences will be ambiguously translated before they are added to the suffix tree. Otherwise they will be ambiguously transcribed. It will add the reverse complement of any non peptide sequence as long as the reverse complement is different.

    my $tree = $GD->build_prefix_tree(-input => ['GGATCC']);
    
    my $ptree = $GD->build_prefix_tree(
      -input => ['GGCCNNNNNGGCC'],
      -peptide => 1
    );
    

search_prefix_tree

Takes a suffix tree and a sequence and searches for results, which are returned as in the Bio::GeneDesign::PrefixTree documentation.

  my $hits = $GD->search_prefix_tree(-tree => $ptree, -sequence => $mygeneseq);
  
  # @$hits = (['BamHI', 4, 'GGATCC', 'i hope this didn't pop up'],
  #          ['OhnoI', 21, 'GGCCC', 'I hope these pop up'],
  #          ['WoopsII', 21, 'GGCCC', 'I hope these pop up']
  #);

pattern_aligner

pattern_adder

codon_change_type

translate

reverse_translate

codon_juggle

subtract_sequence

repeat_smash

make_amplification_primers

NO TEST

contains_homopolymer

Returns 1 if the sequence contains a homopolymer of the provided length (default is 5bp) and 0 else

filter_homopolymers

search_vmatch

filter_blast

carve_building_blocks

NO TEST

chop_oligos

NO TEST

make_graph

make_dotplot

import_seqs

NO TEST

export_seqs

NO TEST

random_dna

replace_ambiguous_bases

PLEASANTRIES

pad

    my $name = 5;
    my $nice = $GD->pad($name, 3);
    $nice == "005" || die;

    $name = "oligo";
    $nice = $GD->pad($name, 7, "_");
    $nice == "__oligo" || die;
      

Pads an integer with leading zeroes (by default) or any provided set of characters. This is useful both to make reports pretty and to standardize the length of designations.

attitude

    my $adverb = $GD->attitude();
  

Ask GeneDesign how it handled your request.

endslash

_stripdown

_checkref

COPYRIGHT AND LICENSE

Copyright (c) 2013, GeneDesign developers All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

* The names of Johns Hopkins, the Joint Genome Institute, the Lawrence Berkeley National Laboratory, the Department of Energy, and the GeneDesign developers may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.