WordNet::SenseRelate::AllWords - Disambiguate All Words in a Text based on semantic similarity and relatedness in WordNet
use WordNet::SenseRelate::AllWords; use WordNet::QueryData; use WordNet::Tools; my $qd = WordNet::QueryData->new; defined $qd or die "Construction of WordNet::QueryData failed"; my $wntools = WordNet::Tools->new($qd); defined $wntools or die "\nCouldn't construct WordNet::Tools object"; my $wsd = WordNet::SenseRelate::AllWords->new (wordnet => $qd, wntools => $wntools, measure => 'WordNet::Similarity::lesk'); my @context = qw/the bridge is held up by red tape/; my @results = $wsd->disambiguate (window => 3, context => [@context]); print "@results\n";
WordNet::SenseRelate::AllWords implements an algorithm for Word Sense Disambiguation that uses measures of semantic relatedness. The algorithm is an extension of an algorithm described by Pedersen, Banerjee, and Patwardhan[1]. This implementation is similar to the original SenseRelate package but disambiguates every word in the given context rather than just single word.
Note: the methods below will die() on serious errors. Wrap calls to the methods in an eval BLOCK to catch the exceptions. See 'perldoc -f eval' for more information.
Example:
my @res; eval {@res = $wsd->disambiguate (args...)} if ($@){ print STDERR "An exception occurred ($@)\n"; }
The constructor for this class. It will create a new instance and return a reference to the constructed object.
Parameters:
wordnet => REFERENCE : WordNet::QueryData object wntools => REFERENCE : WordNet::Tools object measure => STRING : name of a WordNet::Similarity measure config => FILENAME : config file for above measure outfile => FILENAME : name of a file for output (optional) stoplist => FILENAME : file containing list of stop words pairScore => INTEGER : minimum pairwise score (default: 0) contextScore => INTEGER : minimum overall score (default: 0) trace => INTEGER : generate traces (default: 0) forcepos => INTEGER : do part-of-speech coercion (default: 0) nocompoundify => INTEGER : disable compoundify (default: 0) usemono => INTEGER : enable assigning the available sense to usemono (default: 0) backoff => INTEGER : enable assigning most frequent sense if the measure can't assign sense (default: 0)
Returns:
A reference to the constructed object.
WordNet::SenseRelate::AllWords->new (wordnet => $query_data_obj, wntools => $wordnet_tools_obj, measure => 'WordNet::Similarity::lesk', trace => 1);
The trace levels are:
1 Show the context window for each pass through the algorithm. 2 Display winning score for each pass (i.e., for each target word). 4 Display the non-zero scores for each sense of each target word (overrides 2). 8 Display the non-zero values from the semantic relatedness measures. 16 Show the zero values as well when combined with either 4 or 8. When not used with 4 or 8, this has no effect. 32 Display traces from the semantic relatedness module.
These trace levels can be added together. For example, by specifying a trace level of 3, the context window will be displayed along with the winning score for each pass.
Disambiguates all the words in the specified context and returns them as a list. If a word cannot be disambiguated, then it is returned "as is". A word cannot be disambiguated if it is not in WordNet or if no value exceeds the specified threshold.
The context parameter specifies the words to be disambiguated. It treats the value as one sentence. To disambiguate a document with multiple sentences, make one call to disambiguate() for each sentence.
window => INTEGER : the window size to use. A window size of N means that the window will include N words, including the target word. If N is an even number, there will be one more word on the left side of the target word than on the right. tagged => BOOLEAN : true if the text is tagged, false otherwise scheme => normal|sense1|random|fixed : the disambiguation scheme to use context => ARRAY_REF : reference to an array of words to disambiguate
Returns: An array of disambiguated words.
my @results = $wsd->disambiguate (window => 3, tagged => 0, context => [@words]);
Rules for attaching suffixes:
Suffixes are attached to the words in the context in order to ignore those while disambiguation. Note that after converting the tags to WordNet tags, tagged text is treated same as wntagged text.
Below is the ordered enumeration of the words which are ignored for disambiguation and the suffixes attached to those words.
Note that we check for such words in the order below:
1 stopwords => #o 2 Only for tagged text : i) Closed Class words => #CL ii) Invalid Tag => #IT iii) Missing Word => #MW 3 For tagged and wntagged text: i) No Tag => #NT ii) Missing Word => #MW iii) Invalid Tag => #IT 4 Not in WordNet => #ND 5 No Relatedness found with the surrounding words => #NR
Gets the current trace string and resets it to "".
None
The current trace string (before resetting it). If the returned string is not empty, it will end with a newline.
my $str = $wsd->getTrace (); print $str;
L<WordNet::Similarity::AllWords>
The main web page for SenseRelate is :
L<http://senserelate.sourceforge.net/>
There are several mailing lists for SenseRelate:
L<http://lists.sourceforge.net/lists/listinfo/senserelate-users/> L<http://lists.sourceforge.net/lists/listinfo/senserelate-news/> L<http://lists.sourceforge.net/lists/listinfo/senserelate-developers/>
Jason Michelizzi, <jmichelizzi at users.sourceforge.net>
Varada Kolhatkar, <kolha002 at d.umn.edu>
Ted Pedersen, <tpederse at d.umn.edu>
Copyright (C) 2004-2008 by Jason Michelizzi and Ted Pedersen
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
To install WordNet::SenseRelate::AllWords, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WordNet::SenseRelate::AllWords
CPAN shell
perl -MCPAN -e shell install WordNet::SenseRelate::AllWords
For more information on module installation, please visit the detailed CPAN module installation guide.