The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

README - General Information about WordNet-SenseRelate-WordToSet

OVERVIEW

This module takes as input a single target word, and a set of one or more other words. It finds the sense of that target word that is most related to those words in the set. For example, if the target word is "bank", and the words in the set are "money cash loan stock", we might expect that the most related sense of "bank" is that pertaining to financial instituations.

This is potentially useful in determining the predominant sense of a word in a particular domain. For example, if the target word is "game", and the words in the set are from the domain of board games (e.g., "monopoly chess checkers", then the sense of "game" that we'd expect to be most similar or related would be that of games you play rather than the game you hunt. For example, here's some output when game is compared to board games:

 wordtoset.pl game monopoly checkers chess --type WordNet::Similarity::wup

 game#n#1 : 2.52631578947368 : a contest with rules to determine a winner; 
                "you need four people to play this game"  

 game#n#10 : 2.17777777777778 : your occupation or line of work; 
                "he's in the plumbing game"; "she's in show biz"  

 game#n#3 : 2.17777777777778 : an amusement or pastime; 
                "they played word games"; 
                "he thought of his painting as a game that filled his 
                empty time"; "his life was all fun and games"  

Here's some output when we compare that to wild animals:

 wordtoset.pl game turkey boar deer --type WordNet::Similarity::wup

 game#n#4 : 1.98 : animal hunted for food or sport  

 game#n#7 : 1.27777777777778 : the flesh of wild animals that is 
                        used for food  

 game#n#9 : 1.24542124542125 : the game equipment needed in order to 
                        play a particular game; "the child received 
                        several games for his birthday"  

Note that wordtoset.pl will output all of the senses, but we've only shown the top three here in the interests of brevity. We can see that according to the Wu-Palmer measure (wup), the sense of game most similar to the given sense is as we've described above.

WordToSet might also be useful in detecting sentiment orientation. For example, suppose the target word is "war". You could compare that to two different sets such as : "peace love happiness" and "hate death fear". While the predominant sense of "war" might not change, if it has a substantially higher score relative to one of the sets then it could be concluded that war is more associated with that set than the other.

This module uses WordNet and measures of semantic relatedness and similarity from WordNet::Similarity to arrive at its output.

SYNOPSIS

   # from the command line
   
   wordtoset.pl star nebula cosmos orion --type WordNet::Similarity::lin

   wordtoset.pl star movie hollywood director --type WordNet::Similarity::vector 

   # from within a program

   use WordNet::SenseRelate::WordToSet;
   use WordNet::QueryData;

   my $qd = WordNet::QueryData->new;

   my %options = (wordnet => $qd,
                  measure => 'WordNet::Similarity::lesk');

   my $wsd = WordNet::SenseRelate::WordToSet->new (%options);

   my $result = $wsd->disambiguate (target => 'java',
                      context => ['programming_language', 'applet']);

   foreach my $key (keys %$result) {
       print $key, ' : ', $result->{$key}, "\n";
   }

CONTENTS

When the distribution is unpacked, several subdirectories are created:

/lib

This directory contains the Perl modules that do the actual work of disambiguation. By default, these files are isntalled into /usr/local/lib/site_perl/PERL_VERSION (where PERL_VERSION is the version of Perl you are using), or a similar directory. See the INSTALL file for more details.

/bin

This directory contains a script, wordtoset.pl, that lets you run the WSD software without writing your own Perl script.

/doc

This directory contains pod files for README, CHANGES, and INSTALL. These are what should be changed, the files found in the top level directory should be considered read-only.

/t

This directory contains test scripts. These scripts are run when you run 'make test'.

SEE ALSO

 L<http://senserelate.sourceforge.net/>

AUTHORS

 Ted Pedersen, University of Minnesota, Duluth
 tpederse at d.umn.edu

 Jason Michelizzi

This document last modified by : $Id: README.pod,v 1.6 2008/04/07 03:35:51 tpederse Exp $

COPYRIGHT AND LICENSE

Copyright (c) 2004-2008, Ted Pedersen and Jason Michelizzi

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

Note: a copy of the GNU Free Documentation License is available on the web at http://www.gnu.org/copyleft/fdl.html and is included in this distribution as FDL.txt.