The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

AI::Genetic::Pro::Macromolecule - Genetic Algorithms to evolve DNA, RNA and Protein sequences

VERSION

version 0.09280.0_001

SYNOPSIS

    use AI::Genetic::Pro::Macromolecule;

    my @proteins = ($seq1, $seq2, $seq3, ... );

    my $m = AI::Genetic::Pro::Macromolecule->new(
        type    => 'protein',
        fitness => \&hydrophobicity,
        initial_population => \@proteins,
    );

    sub hydrophobicity {
        my $seq = shift;
        my $score = f($seq)

        return $score;
    }

    $m->evolve(10) # evolve for 10 generations;

    my $most_hydrophobic = $m->fittest->{seq};   # get the best sequence
    my $highest_score    = $m->fittest->{score}; # get top score

    # Want the score stats throughout generations?
    my $history = $m->history;

    my $mean_history = $history->{mean}; # [ mean1, mean2, mean3, ... ]
    my $min_history  = $history->{min};  # [ min1,  min2,  min3,  ... ]
    my $max_history  = $history->{max};  # [ max1,  max2,  max3,  ... ]

DESCRIPTION

AI::Genetic::Pro::Macromolecule is a wrapper over AI::Genetic::Pro, aimed at easily evolving protein, DNA or RNA sequences using arbitrary fitness functions.

Its purpose it to allow optimization of macromolecule sequences using Genetic Algorithms, with as little set up time and burdain as possible.

Standing atop AI::Genetic::Pro, it is reasonably fast and memory efficient. It is also highly customizable, although I've chosen what I think are sensible defaults for every parameter, so that you don't have to worry about them if you don't know what they mean.

ATTRIBUTES

fitness

Accepts a CodeRef that should assign a numeric score to each string sequence that it's passed to it as an argument. Required.

    sub fitness {
        my $seq = shift;

        # Do something with $seq and return a score
        my $score = f($seq);

        return $score;
    }

    my $m = AI::Genetic::Pro::Macromolecule->new(
        fitness => \&fitness,
        ...
    );

terminate

Accepts a CodeRef. It will be applied once at the end of each generation. If returns true, evolution will stop, disregarding the generation steps passed to the evolve method.

The CodeRef should accept an AI::Genetic::Pro::Macromolecule object as argument, and should return either true or false.

    sub reached_max {
        my $m = shift;  # an AI::G::P::Macromolecule object

        my $highest_score = $m->fittest->{score};

        if ( $highest_score > 9000 ) {
            warn "It's over 9000!";
            return 1;
        }
    }

    my $m = AI::Genetic::Pro::Macromolecule->new(
        terminate => \&reached_max,
        ...
    );

In the above example, evolution will stop the moment the top score in any generation exceeds the value 9000.

variable_length

Decide whether the sequences can have different lengths. Accepts a Bool value. Defaults to 1.

length

Manually set the allowed maximum length of the sequences, accepts Int.

This attribute is required unless an initial population is provided. In that case, length will be set as equal to the length of the longest sequence provided if it's not explicity specified.

type

Macromolecule type: protein, dna, or rna. Required.

initial_population

Sequences to add to the initial pool before evolving. Accepts an ArrayRef[Str].

    my $m = AI::Genetic::Pro::Macromolecule->new(
        initial_population => ['ACGT', 'CAAC', 'GTTT'],
        ...
    );

cache

Accepts a Bool value. When true, score results for each sequence will be stored, to avoid costly and unnecesary recomputations. Set to 1 by default.

mutation

Mutation rate, a Num between 0 and 1. Default is 0.05.

crossover

Crossover rate, a Num between 0 and 1. Default is 0.95.

population_size

Number of sequences per generation. Default is 300.

parents

Number of parents sequences in recombinations. Default is 2.

selection

Defines how sequences are selected to crossover. It expects an ArrayRef:

    selection => [ $type, @params ]

See docs in AI::Genetic::Pro for details on available selection strategies, parameters, and their meanings. Default is Roulette, in which at first the best individuals/chromosomes are selected. From this collection parents are selected with probability poportionaly to its fitness.

strategy

Defines strategy of crossover operation. It expects an ArrayRef:

    strategy => [ $strategy, @params ]

See docs in AI::Genetic::Pro for details on available crossover strategies, parameters, and their meanings. Default is [ Points, 2 ], in which parents are crossed at 2 points and the best child is moved to the next generation.

preserve

Whether to inject the best sequences for next generation, and if so, how many. Defaults to 5.

METHODS

evolve

    $m->evolve($n);

Evolve the sequence population for the specified number of generations. Accepts an optional single Int argument. If $n is 0 or undef, it will evolve undefinitely or terminate returns true.

generation

Returns the current generation number.

fittest

Returns an Array[HashRef] with the desired number of top scoring sequences. The hash reference has two keys, 'seq' which points to the sequence string, and 'score' which points to the sequence's score.

    my @top_2 = $m->fittest(2);
    # (
    #     { seq => 'VIKP', score => 10 },
    #     { seq => 'VLKP', score => 9  },
    # )

When called with no arguments, it returns a HashRef with the top scoring sequence.

    my $fittest = $m->fittest;
    # { seq => 'VIKP', score => 10 }

history

Returns a HashRef with the minimum, maximum and mean score for each generation.

    my $history = $m->history;
    # {
    #     min  => [ 0, 0, 0, 1, 2, ... ],
    #     max  => [ 1, 2, 2, 3, 4, ... ],
    #     mean => [ 0.2, 0.3, 0.5, 1.5, 3, ... ],
    # }

To access the mean score for the $n-th generation, for instance:

    $m->history->{mean}->[$n - 1];

current_stats

Returns a HashRef with the minimum, maximum and mean score fore the current generation.

    $m->current_stats;
    # { min => 2, max => 10, mean => 3.5 }

current_population

Returns an Array[HashRef] with all the sequences of the current generation and their scores, in no particular order.

    my @seqs = $m->current_population;
    # (
    #     { seq => 'VIKP', score => 10 },
    #     { seq => 'VLKP', score => 9  },
    #     ...
    # )

AUTHOR

  Bruno Vecchi <vecchi.b gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2009 by Bruno Vecchi.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.