Search::Circa - a Search Engine / Indexer running with Mysql
This is Search::Circa, a module who provide functions to perform search on Circa, a www search engine running with Mysql. Circa is for your Web site, or for a list of sites. It indexes like Altavista does. It can read, add and parse all url's found in a page. It add url and word to MySQL for use it at search.
Circa can be used for index 100 to 100 000 url
Notes:
Accents are removed on search and when indexed
Search are case unsensitive (mmmh what my english ? ;-)
Search::Circa::Search work with Search::Circa::Indexer result. Search::Circa::Search is a Perl interface, but it's exist on this package a PHP client too.
Search::Circa is root class for Search::Circa::Indexer and Search::Circa::Search.
See Search::Circa::Search, Search::Circa::Indexer
Search Features
Boolean query language support : or (default) and ("+") not ("-"). Ex perl + faq -cgi : Documents with faq, eventually perl and not cgi.
Client Perl or PHP
Can browse site by directory / rubrique.
Search for different criteria: news, last modified date, language, URL / site.
Full text indexing
Different weights for title, keywords, description and rest of page HTML read can be given in configuration
Herite from features of LWP suite:
Support protocol HTTP://,FTP://, FILE:// (Can do indexation of filesystem without talk to Web Server)
Full support of standard robots exclusion (robots.txt). Identification with CircaIndexer/0.1, mail alian@alianwebserver.com. Delay requests to the same server for 8 secondes. "It's not a bug, it's a feature!" Basic rule for HTTP serveur load.
Support proxy HTTP.
Make index in MySQL
Read HTML and full text plain
Several kinds of indexing : full, incremental, only on a particular server.
Documents not updated are not reindexed.
All requests for a file are made first with a head http request, for information such as validate, last update, size, etc.Size of documents read can be restricted (Ex: don't get all documents > 5 MB). For use with low-bandwidth connections, or computers which do not have much memory.
HTML template can be easily customized for your needs.
Admin functions available by browser interface or command-line.
Index the different links found in a CGI (all after name_of_file?)
Q: Where are clients for example ?
A: See in demo directory. For command line, see circa_admin and circa_search,, for CGI, take a look in cgi-bin/circa, they are installed with make cgi.
Q: Where are global parameters to connect to Circa ?
A: Use lib/CircaConf.pm file
Q : What is an account for Circa ?
A: It's like a project, or a databse. A namespace for what you want.
Q : How I begin with indexer ?
A: See man page of circa_admin
Q : Did you succed to use Circa with mod_perl ?
A: Yes
You use this method behind Search::Circa::Indexer and Search::Circa::Search object
Connect Circa to MySQL. Return 1 on succes, 0 else
user : Utilisateur MySQL
password : Mot de passe MySQL
db : Database MySQL
bost : Adr IP du serveur MySQL
Close connection to MySQL. This method is called with DESTROY method of this class.
Get or set the prefix for table name for use Circa with more than one time on a same database
masque : Path of template
vars : hash ref with keys/val to substitue
Give template with remplaced variables Ex:
$circa->fill_template('A <? $age ?> ans', ('age' => '12 ans'));
Will return:
J'ai 12 ans,
Execute request SQL on db and return first row. In list context, retun full row, else return just first column.
Print message msg on standart output error if debug level for script is upper than level.
Ask in STDIN for a parameter with message and default_value and return value
Search::Circa::Indexer, Indexer module
Search::Circa::Search, Searcher module
Search::Circa::Annuaire, Manage directory of Circa
Search::Circa::Url, Manage url of Circa
Search::Circa::Categorie, Manage categorie of Circa
$Revision: 1.18 $
Alain BARBET alian@alianwebserver.com
To install Search::Circa, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Search::Circa
CPAN shell
perl -MCPAN -e shell install Search::Circa
For more information on module installation, please visit the detailed CPAN module installation guide.