Bio::DOOP::Util::Run::GeneMerge - GeneMerge based GO analyzer
Version 0.02
#!/usr/bin/perl -w use Bio::DOOP::DOOP; $test = Bio::DOOP::Util::Run::GeneMerge->new(); if ($test->getDescFile("GO/use/GO.BP.use") < 0){ print"Desc error\n" } if ($test->getAssocFile("GO/assoc/A_thaliana.converted.BP") < 0){ print"Assoc error\n" } if ($test->getPopFile("GO/pop.500") < 0){ print"Pop error\n" } if ($test->getStudyFile("GO/study.500/combined1314.list") < 0){ print"Study error\n" } $results = $test->getResults(); foreach $res (@{$results}) { print $$res{'GOterm'}," ",$$res{'RawEs'},"\n"; }
This is a module based on GeneMerge v1.2.
Original program described in:
Cristian I. Castillo-Davis and Daniel L. Hartl GeneMerge - post-genomic analysis, data mining, and hypothesis testing Bioinformatics Vol. 19 no. 7 2003, Pages 891-892
The original program is not really good for large scale analysis, because the design uses a lot of I/O processes. This version fetches everything into memory at start.
Tibor Nagy, Godollo, Endre Sebestyen, Martonvasar,
Create new GeneMerge object.
$genemerge = Bio::DOOP::Util::Run::GeneMerge->new;
The method loads the GO association file and stores it in memory. The file format is the following. Each line starts with a cluster id, and after some whitespace the associated GO ids are enumerated, separated by semicolons.
81001020 GO:0016020;GO:0003674;GO:0008150 81001110 GO:0005739;GO:0003674
$genemerge->getAssocFile('/tmp/assoc.txt');
The method loads the population file and stores it in memory. The file format is the following. Each line contains one and only one cluster id.
81001020 81001110
$genemerge->getPopFile('/tmp/pop.txt');
The method calculates the population frequency. Do not use it directly.
The method loads the GO description file. The file format is the following. Each line starts with the GO id, and separated by a tab, the description of the GO id.
GO:0000007 low-affinity zinc ion transporter activity GO:0000008 thioredoxin
$genemerge->getDescFile('/tmp/desc.txt');
The method loads the study data set, counts GO frequencies, calculates P values based on the hypergeometric distribution, and corrects P values, based on the Bonferroni method.
The file format of the study file is the following. Each line contains one and only one cluster id.
$genemerge->getStudyFile('/tmp/study.txt');
The method gives back all the results as an arrayref of hashes.
$results = $genemerge->getResults(); foreach $result (@{$results}) { $goterm = $$result{'GOterm'}; $popfreq = $$result{'PopFreq'}; $popfrac = $$result{'PopFrac'}; $studyfrac = $$result{'StudyFrac'}; $studyfracall = $$result{'StudyFracAll'}; $raw_escore = $$result{'RawEs'}; $escore = $$result{'EScore'}; $desc = $$result{'Desc'}; @contrib = @{$$result{'Contrib'}}; }
This is an internal function to calculate the hypergeometric distribution. Do not use it directly.
Another internal function for the correct statistical results. Do not use it directly.
Factorial calculating function. Do not use it directly.
To install Bio::DOOP::DOOP, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::DOOP::DOOP
CPAN shell
perl -MCPAN -e shell install Bio::DOOP::DOOP
For more information on module installation, please visit the detailed CPAN module installation guide.