SWISH::Filters::Doc2txt - Perl extension for filtering MSWord documents with Swish-e
This is a plug-in module that uses the "catdoc" program to convert MS Word documents to text for indexing by Swish-e. "catdoc" can be downloaded from:
http://www.ice.ru/~vitus/catdoc/ver-0.9.html
The program "catdoc" must be installed and your PATH before running Swish-e.
This filter does not specify input or output character encodings. This will change in the future to all use of the user_data to set the encoding.
A minor optimization during spidering (i.e. when docs are in memory instead of on disk) would be to use open2() call to let catdoc read from stdin instead of from a file.
Bill Moseley
SWISH::Filter
To install SWISH::Filter, copy and paste the appropriate command in to your terminal.
cpanm
cpanm SWISH::Filter
CPAN shell
perl -MCPAN -e shell install SWISH::Filter
For more information on module installation, please visit the detailed CPAN module installation guide.