Lingua::NATools::Client - Simple API to query NAT Objects
use Lingua::NATools::Client; $client = Lingua::NATools::Client->new();
Lingua::NATools::Client is a simple query API to talk with NAT copora Objects. It can use a client-server approach (See nat-server) or directly with local access to the filesystem.
This module includes functions to query NATools Objects. To query you must first create a client object with the new method.
The new object receives an hash with configuration parameters, and creates a client object. For instance,
$client = Lingua::NATools::Client->new( Local => "/opt/corpora/foo" );
Known options are:
The IP address where the server is running on. Defaults to 127.0.0.1.
The port to be used in the connection. Defaults to 4000.
A local directory with a NATools object. Note than not all methods support local corpora.
A local Data::Dumper object with a NATools PTD. Note than not all methods support local NATools PTDs.
If the LocalDumper value is a reference to an array it is supposed to contain two positions, with both dictionary filenames. If its value is a string, it is supposed to be the filename with BOTH dictionaries included.
This method is used to iterate through a probabilistic translation dictionary. Pass a function reference to handle each dictionary entry. This function will be called with a flattened hash with keywords word, trans and count.
word
trans
count
Use as first argument an hash reference to configure the method behaviour. For instance:
$client -> iterate( {Language => 'source'}, sub { my %param = @_; print "$param{word}\n"; });
This method is only available on server mode. Returns an hash table where keys are corpora names (identifiers). Values are hash tables with keys "id", """source" and "target". Values are the corpus identifier and the language names.
$corpora = $client->list; # $corpora={ Crp1=> { id=> 1, source=> 'PT', target=> 'EN' } }
This method is also used only on server mode. It selects a corpus that will be used by all subsequent queries.
$client->set_corpus(3);
This method is used to query Probabilistic Translation Dictionaries. As first argument you might pass a hash reference with configuration options. The only mandatory one is the word being searched.
A corpus identifier to use. If not set, will use the first one or the one selected previously with set_corpus
set_corpus
This option chooses the direction on the query. By default, a query on the source language is used. If direction is <~ the target language is used.
<~
On local corpus mode, and server mode, you can query by identifier instead of word. For that use as direction ~#> or <#~.
~#>
<#~
Returns an array reference. First element if the occurrence count of the word, second is an hash with the translation probabilities, and the third one is the word searched.
To query meta-information use this method. At the moment it just works for server corpora. Pass it a reference to a configuration hash if you need to choose the corpus (see the ptd documentation, for instance). Mandatory parameter is the name of the attribute being queried. Returns the value if found, undef otherwise.
ptd
This method is used to query for concordancies on the corpus. This method is not available with LocalDumper.
LocalDumper
Mandatory arguments are one or two strings to search. First argument might be an hash reference with configuratoin details:
The corpus identifier to be queried. Just used on server mode. If not used, the identifier 1 is used, or the one selected before with the set_corpus method.
The direction on which the query will be done. At the moment, it defaults to query on the source side (thus, ignoring the second argument). You might use <- to query the target language (also ignores the second argument) or to use <-> to query both languages.
<-
<->
If you want to do pattern matching, use one of =>, <= or <=>.
=>
<=
<=>
TODO: make this interface cleaner.
Number of results to be presented. Defaults to 20. This value is always limited by the server.
This method is used to query the ngram databases. Not all corpus have the ngram indexes, thus, some answers might be just a reference to an empty list.
At the moment use the same parameters for configuration as other methods (diretion and crp), and a string with the query. For instance:
diretion
crp
foo * --> all bigram with "foo" as first word foo * bar --> all trigrams with foo as first word and bar as the last word foo bar --> the bigram "foo bar"
It returns a list of ngrams. Each ngram is a list the the words, and as the last element the occurrence count.
See perl(1) and NATools documentation.
Alberto Manuel Brandao Simoes, <albie@alfarrabio.di.uminho.pt>
Copyright 2002-2012 by Natura Project http://natura.di.uminho.pt
This library is free software; you can redistribute it and/or modify it under the GNU General Public License 2, which you should find on parent directory. Distribution of this module should be done including all NATools package, with respective copyright notice.
To install Lingua::NATools, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::NATools
CPAN shell
perl -MCPAN -e shell install Lingua::NATools
For more information on module installation, please visit the detailed CPAN module installation guide.