Bio::GeneDesign
Version 5.52
Sarah Richardson <smrichardson@lbl.gov>
Returns an initialized Bio::GeneDesign object.
This function reads the ConfigData written at installation, imports the relevant sublibraries, and sets the relevant paths.
my $GD = Bio::GeneDesign->new();
returns a value if EMBOSS_support was vetted and approved during installation.
returns a value if BLAST_support was vetted and approved during installation.
returns a value if graphing_support was vetted and approved during installation.
returns a value if vmatch_support was vetted and approved during installation.
Returns a hash reference where the keys are enzyme names and the values are RestrictionEnzyme objects, if the enzyme set has been defined.
To set this value, use set_restriction_enzymes.
Returns the name of the enzyme set in use, if there is one.
Returns a hash reference where the keys are enzyme names and the values are RestrictionEnzyme objects
Returns the name of the organism in use, if there is one.
To set this value, use set_organism.
Returns the codon table in use, if there is one.
The codon table is a hash reference where the keys are upper case nucleotides and the values are upper case single letter amino acids.
my $codon_t = $GD->codontable(); $codon_t->{"ATG"} eq "M" || die;
To set this value, use set_codontable.
Returns the reverse codon table in use, if there is one.
The reverse codon table is a hash reference where the keys are upper case single letter amino acids and the values are upper case nucleotides.
my $revcodon_t = $GD->reversecodontable(); $revcodon_t->{"M"} eq "ATG" || die;
This value is set automatically when set_codontable is run.
Returns the RSCU table in use, if there is one.
The RSCU codon table is a hash reference where the keys are upper case nucleotides and the values are floats.
my $rscu_t = $GD->rscutable(); $rscu_t->{"ATG"} eq 1.00 || die;
To set this value, use set_rscu_table.
my $Tm = $GD->melt(-sequence => $myseq);
The -sequence argument is required.
Returns the melting temperature of a DNA sequence.
You can set the salt and DNA concentrations with the -salt and -concentration arguments; they are 50mm (.05) and 100 pm (.0000001) respectively.
You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be analyzed with the -sequence flag.
There are four different formulae to choose from. If you wish to use the nearest neighbor method, use the -nearest_neighbor flag. Otherwise the appropriate formula will be determined by the length of your -sequence argument.
For sequences under 14 base pairs: Tm = (4 * #GC) + (2 * #AT).
For sequences between 14 and 50 base pairs: Tm = 100.5 + (41 * #GC / length) - (820 / length) + 16.6 * log10(salt)
For sequences over 50 base pairs: Tm = 81.5 + (41 * #GC / length) - (500 / length) + 16.6 * log10(salt) - .62;
$my_seq = "AATTCG"; my $complemented_seq = $GD->complement($my_seq); $complemented_seq eq "TTAAGC" || die; my $reverse_complemented_seq = $GD->complement($my_seq, 1); $reverse_complemented_seq eq "CGAATT" || die; #clean my $complemented_seq = $GD->complement(-sequence => $my_seq); $complemented_seq eq "TTAAGC" || die; my $reverse_complemented_seq = $GD->complement(-sequence => $my_seq, -reverse => 1); $reverse_complemented_seq eq "CGAATT" || die;
Complements or reverse complements a DNA sequence.
You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.
If you also pass along a true statement, the sequence will be reversed and complemented.
$my_seq = "AATTCG"; my $count = $GD->count($my_seq); $count->{C} == 1 || die; $count->{G} == 1 || die; $count->{A} == 2 || die; $count->{GCp} == 33.3 || die; $count->{ATp} == 66.7 || die; #clean my $count = $GD->count(-sequence => $my_seq);
You must pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object.
the count function counts the bases in a DNA sequence and returns a hash reference where each base (including the ambiguous bases) are keys and the values are the number of times they appear in the sequence. There are also the special values GCp and ATp for GC and AT percentage.
my $my_seq = "ABC"; my $regex = $GD->regex_nt(-sequence => $my_seq); # $regex is qr/A[CGT]C/; my $regarr = $GD->regex_nt(-sequence => $my_seq --reverse_complement => 1); # $regarr is [qr/A[CGT]C/, qr/G[ACG]T/]
You must pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed with the -sequence flag.
regex_nt creates a compiled regular expression or a set of them that can be used to query large nucleotide sequences for possibly ambiguous subsequences.
If you want to get regular expressions for both the forward and reverse senses of the DNA, use the -reverse_complement flag and expect a reference to an array of compiled regexes.
my $my_pep = "AEQ*"; my $regex = $GD->regex_aa(-sequence => $my_pep); $regex == qr/AEQ[\*]/ || die;
Creates a compiled regular expression or a set of them that can be used to query large amino acid sequences for smaller subsequences.
You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed with the -sequence flag.
my $my_seq = "ABC"; my $flag = $GD->sequence_is_ambiguous($my_seq); $flag == 1 || die; $my_seq = "ATC"; $flag = $GD->sequence_is_ambiguous($my_seq); $flag == 0 || die;
Checks to see if a DNA sequence contains ambiguous bases (RYMKWSBDHVN) and returns true if it does.
my $my_seq = "ABC"; my @peps = $GD->ambiguous_translation(-sequence => $my_seq, -frame => 1); # @peps is qw(I T C)
You must pass a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.
Translates a nucleotide sequence that may have ambiguous bases and returns an array of possible peptides.
The frame argument may be 1, 2, 3, -1, -2, or -3. It may also be t (three, 1, 2, 3), or s (six, 1, 2, 3, -1, -2, -3). It defaults to 1.
my $my_seq = "ABC"; my $seqs = $GD->ambiguous_transcription($my_seq); # $seqs is [qw(ACC AGC ATC)]
Deambiguates a nucleotide sequence that may have ambiguous bases and returns a reference to a sorted array of possible unambiguous sequences.
my $seq = "TGCTGACTGCAGTCAGTACACTACGTACGTGCATGAC"; my $seek = "CWC"; my $positions = $GD->positions(-sequence => $seq, -query => $seek); # $positions is {18 => "CAC"} $positions = $GD->positions(-sequence => $seq, -query => $seek, -reverse_complement => 1); # $positions is {18 => "CAC", 28 => "GTG"}
Finds and returns all the positions and sequences of a potentially ambiguous subsequence in a larger sequence. The reverse_complement flag is off by default.
You can pass either string variables, Bio::Seq objects, or Bio::SeqFeatureI objects as the sequence and query arguments; additionally you may pass a RestrictionEnzyme object as the query argument.
# load a codon table from the GeneDesign configuration directory $GD->set_codontable(-organism_name => "yeast"); # load a codon table from an arbitrary path and catch it in a variable my $codon_t = $GD->set_codontable(-organism_name => "custom", -table_path => "/path/to/table.ct");
The -organism_name argument is required.
This function loads, sets, and returns a codon definition table. After it is run the accessor codontable will return the hash reference that represents the codon table.
If no path is provided, the configuration directory /codon_tables is checked for tables that match the provided organism name. If there is no table in that directory, a warning will appear and the standard codon table will be used.
Any codon table that is missing a definition for a codon will cause a warning to be issued. The table format for codon tables is
# Standard genetic code {TTT} = F {TTC} = F {TTA} = L ...
See NCBI's table
# load a RSCU table from the GeneDesign configuration directory $GD->set_rscutable(-organism_name => "yeast"); # load an RSCU table from an arbitrary path and catch it in a variable my $rscu_t = $GD->set_rscutable(-organism_name => "custom", -table_path => "/path/to/table.rscu");
This function loads, sets, and returns an RSCU table. After it is run the accessor rscutable will return the hash reference that represents the RSCU table.
If no path is provided, the configuration directory /codon_tables is checked for tables that match the provided organism name. If there is no table in that directory, a warning will appear and the flat RSCU table will be used.
Any RSCU table that is missing a definition for a codon will cause a warning to be issued. The table format for RSCU tables is
# Saccharomyces cerevisiae (Highly expressed genes) # Nucleic Acids Res 16, 8207-8211 (1988) {TTT} = 0.19 {TTC} = 1.81 {TTA} = 0.49 ...
See Sharp et al. 1986.
# load both codon tables and RSCU tables simultaneously $GD->set_organism(-organism_name => "yeast"); # with arguments $GD->set_organism(-organism_name => "custom", -table_path => "/path/to/table.ct", -rscu_path => "/path/to/table.rscu");
This function is just a shortcut; it runs "set_codontable" in set_codontable and "set_rscutable" in set_rscutable. See those functions for details.
# count the codons in a list of sequences my $tally = $GD->codon_count(-input => \@sequences); # add a gene to an existing codon count $tally = $GD->codon_count(-input => $sequence, -count => $tally); # add a list of Bio::Seq objects to an existing codon count $tally = $GD->codon_count(-input => \@seqobjects, -count => $tally);
The -input argument is required and will take a string variable, a Bio::Seq object, a Bio::SeqFeatureI object, or a reference to an array full of any combination of those things.
The codon_count function takes a set of sequences and counts how often each codon appears in them. It returns a hash reference where the keys are upper case nucleotide codons and the values are integers. If you pass a hash reference containing codon counts with the -count argument, new counts will be added to the old values.
This function will warn you if non nucleotide codons are found.
TODO: what about ambiguous codons?
my $rscu_t = $GD->generate_RSCU_table(-sequences => \@list_of_sequences);
The -sequences argument is required and will take a string variable, a Bio::Seq object, a Bio::SeqFeatureI object, or a reference to an array full of any combination of those things.
The generate_RSCU_table function takes a set of sequences, counts how often each codon appears, and returns an RSCU table as a hash reference where the keys are upper case nucleotide codons and the values are floats.
my $report = $GD->generate_codon_report(-sequences => \@list_of_sequences);
The report will have the format
TTT (F) 12800 0.74 TTC (F) 21837 1.26 TTA (L) 4859 0.31 TTG (L) 18806 1.22
where the first column in each group is the codon, the second column is the one letter amino acid abbreviation in parentheses, the third column is the number of times that codon has been seen, and the fourth column is the RSCU value for that codon.
This report comes in a 4x4 layout, as would a standard genetic code table in a textbook.
NO TEST
my $contents = $GD->generate_RSCU_file( -sequences => \@seqs, -comments => ["Got these codons from mice"] ); open (my $OUT, '>', '/path/to/cods') || die "can't write to /path/to/cods"; print $OUT $contents; close $OUT;
This function generates a string that can be written to file to serve as a GeneDesign RSCU table. Provide a set of sequences and an optional array reference of comments to prepend to the file.
The file will have the format # Comment 1 # ... # Comment n {TTT} = 0.19 {TTC} = 1.81 ...
my @available_enzlists = $GD->list_enzyme_sets(); # @available_enzlists == ('standard_and_IIB', 'blunts', 'IIB', 'nonpal', ...)
Returns an array containing the names of every restriction enzyme recognition list GeneDesign knows about.
$GD->set_restriction_enzymes(-enzyme_set => 'blunts');
or
$GD->set_restriction_enzymes(-list_path => '/path/to/enzyme_file');
or even
$GD->set_restriction_enzymes( -list_path => '/path/to/enzyme_file', -enzyme_set => 'custom_enzymes' );
All will return a hash structure full of restriction enzymes.
Tell GeneDesign which set of restriction enzymes to use. If you provide only a set name with the -enzyme_set flag, GeneDesign will check its config path for a matching file. Otherwise you must provide a path to a file (and optionally a name for the set).
Removes a subset of enzymes from an enzyme list. This only happens in memory, no files will be altered. The argument is an array reference of enzyme names.
$GD->set_restriction_enzymes(-enzyme_set => 'blunts'); $GD->remove_from_enzyme_set(-enzymes => ['SmaI', 'MlyI']);
Adds a subset of enzymes to an enzyme list. This only happens in memory, no files will be altered. The argument is an array reference of RestrictionEnzyme objects.
#Grab all known enzymes my $allenz = $GD->set_restriction_enzymes(-enzyme_set => 'standard_and_IIB'); #Pull out a few my @keepers = ($allenz->{'BmrI'}, $allenz->{'HphI'}); #Give GeneDesign the enzyme set you want $GD->set_restriction_enzymes(-enzyme_set => 'blunts'); #Add the few enzymes it didn't have before $GD->add_to_enzyme_set(-enzymes => \@keepers);
Take an array reference of nucleotide sequences (they can be strings, Bio::Seq objects, or Bio::GeneDesign::RestrictionEnzyme objects) and create a suffix tree. If you add the peptide flag, the sequences will be ambiguously translated before they are added to the suffix tree. Otherwise they will be ambiguously transcribed. It will add the reverse complement of any non peptide sequence as long as the reverse complement is different.
my $tree = $GD->build_prefix_tree(-input => ['GGATCC']); my $ptree = $GD->build_prefix_tree( -input => ['GGCCNNNNNGGCC'], -peptide => 1 );
Takes a suffix tree and a sequence and searches for results, which are returned as in the Bio::GeneDesign::PrefixTree documentation.
my $hits = $GD->search_prefix_tree(-tree => $ptree, -sequence => $mygeneseq); # @$hits = (['BamHI', 4, 'GGATCC', 'i hope this didn't pop up'], # ['OhnoI', 21, 'GGCCC', 'I hope these pop up'], # ['WoopsII', 21, 'GGCCC', 'I hope these pop up'] #);
Returns 1 if the sequence contains a homopolymer of the provided length (default is 5bp) and 0 else
my $name = 5; my $nice = $GD->pad($name, 3); $nice == "005" || die; $name = "oligo"; $nice = $GD->pad($name, 7, "_"); $nice == "__oligo" || die;
Pads an integer with leading zeroes (by default) or any provided set of characters. This is useful both to make reports pretty and to standardize the length of designations.
my $adverb = $GD->attitude();
Ask GeneDesign how it handled your request.
Copyright (c) 2013, GeneDesign developers All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* The names of Johns Hopkins, the Joint Genome Institute, the Lawrence Berkeley National Laboratory, the Department of Energy, and the GeneDesign developers may not be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
To install Bio::GeneDesign, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::GeneDesign
CPAN shell
perl -MCPAN -e shell install Bio::GeneDesign
For more information on module installation, please visit the detailed CPAN module installation guide.