The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

similarity_match.pl

SYNOPSIS

Compares a list of annotations to another ontology and suggests the best match based on the EBI::FGPT::FuzzyRecogniser module. It is also possible to align one ontology to another. Accepts ontologies in both OBO and OWL formats as well as MeSH ASCII and OMIM txt.

The script runs non-interactively and the results have to be manually inspected, although it can be expected that anything with a similarity score higher than ~80-90% will be a valid match.

USAGE

similarity_match.pl (-w owlfile || -o obofile || -m meshfile || -i omimfile) -t targetfile -r resultfile [--obotarget || --owltarget]

Optional '--obotarget' setting specifies that the target file is an OBO ontology. Optional '--owltarget' setting specifies that the target file is an OWL ontology.

INPUT FILES

ontologies to map the targetfile against

owlfile, obofile, meshfile, omimfile are ontologies in OWL, OBO, MeSH ASCII and OMIM formats respectively Only a single file needs to be specified.

targetfile

The script expects a tab-delimited text file with headers. Only the first column will be used for matching. All other columns will be preserved in the output.

OUTPUT

The script will produce a single tab-delimited file as set with the -r flag. The file will have four additional headers

SOURCE_ACCESSION

Accession of the source term if target file was an ontology.

SOURCE_LABEL

Label of the source term if target file was an ontology.

SOURCE_VALUE

The annotation (label or synoym if target file was an ontology) that was matched based on the highest similarity against the supplied ontology file

MATCHED_ACCESSION

Accession of the matched term that provided the best match.

MATCHED_LABEL

Matched term's label.

MATCHED_VALUE

The actual term's annotation (label or synoym) that was matched based on the highest similarity from the supplied ontology file.

MATCH_SIMILARITY%

Similarity score of the two matched terms normalised by lenght of the longer of the two strings and expressed in %. Higher is better.

DESCRIPTION

Function list

align()

Aligns the two data structures targetfile and ontology. Outputs the results into a file.

parseFlat()

Custom flat file parser.

parseFlatColumns()

Splits and joins the columns of a flat file. The first column is assigned to the first element. Concatenates the ragged end (leftover columns) into the second element or returns undef for a one-column file.

ACKNOWLEDGMENTS

Emma Hastings <emma@ebi.ac.uk>

AUTHORS

Tomasz Adamusiak <tomasz@cpan.org>