Lingua::BioYaTeA::PostProcessing - Perl extension for postprocessing BioYaTeA term extraction.
use Lingua::BioYaTeA::PostProcessing;
my $postProc = Lingua::BioYaTeA::PostProcessing->new( { 'input-file' => "sampleEN-output.xml", 'output-file' => "sampleEN-bioyatea-out-pp.xml", 'configuration' => "post-processing-filtering.conf", } ); $postProc->logfile(dirname($postProc->output_file) . '/term-filtering.log'); $postProc->load_configuration; $postProc->defineTwigParser; $postProc->filtering; $postProc->printResume;
The module implements an extension for the post-processing of the BioYaTeA (Lingua::BioYaTeA output. Currently, the XML BioYaTeA output is filtered according to rules in order to remove non relevant extracted terms.
Lingua::BioYaTeA
The input and output files are in the XML YaTeA format.
The configuration file provides patterns related to various types: inflected forms (FORM) or lemmatized forms (LEMMA) of terms or term components and action to perform. Currently only the CLEAN action (to remove terms) is implemented.
FORM
LEMMA
CLEAN
new(\%options);
The method creates a post-processing component of BioYaTeA and sets the option attribute with the hashtable @options, and returns the created object.
@options
The hashtable @options contains several fields: the input file name input-file, the output file name output-file, the configuration file name configuration and the temporary directory name tmp-dir.
input-file
output-file
configuration
tmp-dir
Other attributes are: the XML::Twig parser twig_parser, the counter of term candidates tc_counter, the counter of rejected terms count_rejected, the list of regular expressions used to identify terms to reject reg_exps, the indication whether the application of each regular expression is case insensitive case_insensitive, the log file handler logfh, the output file handler logout, and the log file name logfile.
XML::Twig
twig_parser
tc_counter
count_rejected
reg_exps
case_insensitive
logfh
logout
logfile
reg_exps is a hashtable where keys are FORM and values are an array of regular expressions.
case_insensitive is a hashtable where keys are regular expressions.
tc_counter($tc_counter);
This method sets the attribute tc_counter with the value $tc_counter and returns it. When no argument is given, the value of the attribute tc_counter is return.
$tc_counter
logfh($logfh);
This method sets the attribute logfh with the handler $logfh and returns it. When no argument is given, the value of the attribute logfh is return.
$logfh
outfh();
This method sets the attribute outfh with the handler $outfh and returns it. When no argument is given, the value of the attribute outfh is return.
outfh
$outfh
count_rejected($count_rejected);
This method sets the attribute count_rejected with the value $count_rejected and returns it. When no argument is given, the value of the attribute count_rejected is return.
$count_rejected
case_insensitive(\%case_insensitive);
This method sets the attribute case_insensitive with the hashtable %case_insensitive and returns it. When no argument is given, the hashtable reference of the attribute case_insensitive is return.
%case_insensitive
case_insensitive_elt($case_insensitive_name, case_insensitive_value);
This method sets the indication whether the regular expression $case_insensitive_name is case insensitive or not (value $case_insensitive_value) in the hashtable referred by the attribute case_insensitive and returns it. When one argument is set, the value associated to the regular expression $case_insensitive_name is return. When no argument is given, an undefined value is return.
$case_insensitive_name
$case_insensitive_value
exists_case_insensitive_elt($case_insensitive_name);
The method indicates if the application of the regular expression $case_insensitive_name is case insensitive or not.
options(\%options);
This method sets the attribute options with the hashtable %options and returns it. When no argument is given, the hashtable reference of the attribute options is return.
options
%options
configuration($configuration);
This method sets the attribute configuration with the value $configuration and returns it. When no argument is given, the value of the attribute configuration is return.
$configuration
input_file($input_file);
This method sets the field input-file of the attribute options with the value $input_file (input file name) and returns it. When no argument is given, the value of the field input-file of the attribute options is return.
$input_file
logfile($logfile);
This method sets the field log-file of the attribute options with the value $log_file (log file name) and returns it. When no argument is given, the value of the field log-file of the attribute options is return.
log-file
$log_file
tmp_dir($tmp_dir);
This method sets the field tmp-dir of the attribute options with the value $output_file (output file name) and returns it. When no argument is given, the value of the field output_file of the attribute options is return.
$output_file
output_file
output_file($output_file);
This method sets the field output-file of the attribute options with the value $output_file (output file name) and returns it. When no argument is given, the value of the field output-file of the attribute options is return.
reg_exps(\%reg_exps);
This method sets the attribute reg_exps with the hashtable %reg_exps and returns it. When no argument is given, the hashtable reference of the attribute reg_exps is return.
%reg_exps
reg_exp_elt($reg_exp_name, $reg_exp_value);
This method adds the regular expression $reg_exp_value to the array related to the type of patterns $reg_exp_name and returns it. When one argument is set, the array referred by $reg_exp_name is return. When no argument is given, a reference to an empty array is return.
$reg_exp_value
$reg_exp_name
twig_parser($twig_parser);
This method sets the attribute twig_parser with the XML:Twig parser $twig_parser and returns it. When no argument is given, the value of the attribute twig_parser is return.
XML:Twig
$twig_parser
defineTwigParser();
The method defines the XML::Twig parser and associates to the object.
processTerms($twig_parser,$data);
The function processes terms which match regular expressions by applying associated actions (as defined in the configuration file, for instance). The terms are in XML tree $data.
$data
Note: this is a function which uses in the XML::Twig parser (called as function pointer).
load_configuration();
The method process and loads the configuration file (set in the attribute configuration of the current object). The attributes reg_exps and case_insensitive are set by this method.
filtering();
The method performs the full filtering of the terms:
printResume();
The method prints the number of rejected terms and the number of remaining candidate terms.
rmlog();
The method deletes the log file.
The configuration file defines the action to perform when an associated regular expression matches a term form. For instance:
CLEAN=FORM::/[Vv]arious/
Each line defines an association between an action (only CLEAN for the moment) and a regular expression to apply to a form of a term (FORM for the inflected form, LEMMA for the lemmatised form).
The action and regular expression parts are separated by the character =. The two elements of the regular expression are separated by two collons (::).
=
::
Comments are introduced by a # character at the begin of the line.
#
Documentation of Lingua::YaTeA
Wiktoria Golik <wiktoria.golik@jouy.inra.fr>, Zorana Ratkovic <Zorana.Ratkovic@jouy.inra.fr>, Robert Bossy <Robert.Bossy@jouy.inra.fr>, Claire Nédellec <claire.nedellec@jouy.inra.fr>, Thierry Hamon <thierry.hamon@univ-paris13.fr>
Copyright (C) 2012 Wiktoria Golik, Zorana Ratkovic, Robert Bossy, Claire Nédellec and Thierry Hamon
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.
To install Lingua::BioYaTeA, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::BioYaTeA
CPAN shell
perl -MCPAN -e shell install Lingua::BioYaTeA
For more information on module installation, please visit the detailed CPAN module installation guide.