AI::Genetic::Pro::Macromolecule - Genetic Algorithms to evolve DNA, RNA and Protein sequences
version 0.09280.0_001
use AI::Genetic::Pro::Macromolecule; my @proteins = ($seq1, $seq2, $seq3, ... ); my $m = AI::Genetic::Pro::Macromolecule->new( type => 'protein', fitness => \&hydrophobicity, initial_population => \@proteins, ); sub hydrophobicity { my $seq = shift; my $score = f($seq) return $score; } $m->evolve(10) # evolve for 10 generations; my $most_hydrophobic = $m->fittest->{seq}; # get the best sequence my $highest_score = $m->fittest->{score}; # get top score # Want the score stats throughout generations? my $history = $m->history; my $mean_history = $history->{mean}; # [ mean1, mean2, mean3, ... ] my $min_history = $history->{min}; # [ min1, min2, min3, ... ] my $max_history = $history->{max}; # [ max1, max2, max3, ... ]
AI::Genetic::Pro::Macromolecule is a wrapper over AI::Genetic::Pro, aimed at easily evolving protein, DNA or RNA sequences using arbitrary fitness functions.
Its purpose it to allow optimization of macromolecule sequences using Genetic Algorithms, with as little set up time and burdain as possible.
Standing atop AI::Genetic::Pro, it is reasonably fast and memory efficient. It is also highly customizable, although I've chosen what I think are sensible defaults for every parameter, so that you don't have to worry about them if you don't know what they mean.
Accepts a CodeRef that should assign a numeric score to each string sequence that it's passed to it as an argument. Required.
CodeRef
sub fitness { my $seq = shift; # Do something with $seq and return a score my $score = f($seq); return $score; } my $m = AI::Genetic::Pro::Macromolecule->new( fitness => \&fitness, ... );
Accepts a CodeRef. It will be applied once at the end of each generation. If returns true, evolution will stop, disregarding the generation steps passed to the evolve method.
evolve
The CodeRef should accept an AI::Genetic::Pro::Macromolecule object as argument, and should return either true or false.
AI::Genetic::Pro::Macromolecule
sub reached_max { my $m = shift; # an AI::G::P::Macromolecule object my $highest_score = $m->fittest->{score}; if ( $highest_score > 9000 ) { warn "It's over 9000!"; return 1; } } my $m = AI::Genetic::Pro::Macromolecule->new( terminate => \&reached_max, ... );
In the above example, evolution will stop the moment the top score in any generation exceeds the value 9000.
Decide whether the sequences can have different lengths. Accepts a Bool value. Defaults to 1.
Bool
Manually set the allowed maximum length of the sequences, accepts Int.
Int
This attribute is required unless an initial population is provided. In that case, length will be set as equal to the length of the longest sequence provided if it's not explicity specified.
length
Macromolecule type: protein, dna, or rna. Required.
Sequences to add to the initial pool before evolving. Accepts an ArrayRef[Str].
ArrayRef[Str]
my $m = AI::Genetic::Pro::Macromolecule->new( initial_population => ['ACGT', 'CAAC', 'GTTT'], ... );
Accepts a Bool value. When true, score results for each sequence will be stored, to avoid costly and unnecesary recomputations. Set to 1 by default.
Mutation rate, a Num between 0 and 1. Default is 0.05.
Num
Crossover rate, a Num between 0 and 1. Default is 0.95.
Number of sequences per generation. Default is 300.
Number of parents sequences in recombinations. Default is 2.
Defines how sequences are selected to crossover. It expects an ArrayRef:
ArrayRef
selection => [ $type, @params ]
See docs in AI::Genetic::Pro for details on available selection strategies, parameters, and their meanings. Default is Roulette, in which at first the best individuals/chromosomes are selected. From this collection parents are selected with probability poportionaly to its fitness.
Defines strategy of crossover operation. It expects an ArrayRef:
strategy => [ $strategy, @params ]
See docs in AI::Genetic::Pro for details on available crossover strategies, parameters, and their meanings. Default is [ Points, 2 ], in which parents are crossed at 2 points and the best child is moved to the next generation.
Whether to inject the best sequences for next generation, and if so, how many. Defaults to 5.
$m->evolve($n);
Evolve the sequence population for the specified number of generations. Accepts an optional single Int argument. If $n is 0 or undef, it will evolve undefinitely or terminate returns true.
terminate
Returns the current generation number.
Returns an Array[HashRef] with the desired number of top scoring sequences. The hash reference has two keys, 'seq' which points to the sequence string, and 'score' which points to the sequence's score.
Array[HashRef]
my @top_2 = $m->fittest(2); # ( # { seq => 'VIKP', score => 10 }, # { seq => 'VLKP', score => 9 }, # )
When called with no arguments, it returns a HashRef with the top scoring sequence.
HashRef
my $fittest = $m->fittest; # { seq => 'VIKP', score => 10 }
Returns a HashRef with the minimum, maximum and mean score for each generation.
my $history = $m->history; # { # min => [ 0, 0, 0, 1, 2, ... ], # max => [ 1, 2, 2, 3, 4, ... ], # mean => [ 0.2, 0.3, 0.5, 1.5, 3, ... ], # }
To access the mean score for the $n-th generation, for instance:
$n
$m->history->{mean}->[$n - 1];
Returns a HashRef with the minimum, maximum and mean score fore the current generation.
$m->current_stats; # { min => 2, max => 10, mean => 3.5 }
Returns an Array[HashRef] with all the sequences of the current generation and their scores, in no particular order.
my @seqs = $m->current_population; # ( # { seq => 'VIKP', score => 10 }, # { seq => 'VLKP', score => 9 }, # ... # )
Bruno Vecchi <vecchi.b gmail.com>
This software is copyright (c) 2009 by Bruno Vecchi.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install AI::Genetic::Pro::Macromolecule, copy and paste the appropriate command in to your terminal.
cpanm
cpanm AI::Genetic::Pro::Macromolecule
CPAN shell
perl -MCPAN -e shell install AI::Genetic::Pro::Macromolecule
For more information on module installation, please visit the detailed CPAN module installation guide.