File::Extractor - Extract meta-data from arbitrary files
use File::Extractor; my $extractor = File::Extractor->loadDefaultLibraries; my %keywords = $extractor->getKeywords($fh);
This module provides a perl interface to libextractor.
GNU libextractor provides developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files.
Currently, libextractor supports the following formats: HTML, PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, MP3 (ID3v1 and ID3v2), OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, REAL, RIFF (AVI), MPEG, QT and ASF.
Also, various additional MIME types are detected. It can also be used to compute hash functions (SHA-1, MD5, ripemd160).
http://www.gnunet.org/libextractor/
getDefaultLibraries
my @default_libraries = File::Extractor->getDefaultLibraries;
Return a list of strings which are the names of the default extractor libraries.
loadDefaultLibraries
my $extractor = File::Extractor->loadDefaultLibraries;
Load the default set of libraries. Returns a File::Extractor instance.
loadConfigLibraries
my $extractor = File::Extractor->loadConfigLibraries($config); my $new_extractor = $extractor->loadConfigLibraries($config);
Load multiple libraries as specified by the user. $config is a string given by the user that defines which libraries should be loaded. Has the format "[[-]LIBRARYNAME[(options)][:[-]LIBRARYNAME[(options)]]]*".. For example libextractor_mp3.so:libextractor_ogg.so loads the mp3 and the ogg library. The '-' before the LIBRARYNAME indicates that the library should be added to the end of the library list (addLibraryLast).
$config
"[[-]LIBRARYNAME[(options)][:[-]LIBRARYNAME[(options)]]]*".
libextractor_mp3.so:libextractor_ogg.so
addLibraryLast
addLibrary
my $extractor = File::Extractor->addLibrary($library); my $new_extractor = $extractor->addLibrary($library);
Add a library for keyword extraction. $library is the name of the library to be loaded.
$library
my $extractor = File::Extractor->addLibraryLast($library); my $new_extractor = $extractor->addLibraryLast($library);
Add a library for keyword extraction at the end of the list. $library is the name of the library to be loaded.
removeLibrary
$extractor->removeLibrary($library);
Remove a library for keyword extraction. $library is the name of the library to be removed.
getKeywords
my %keywords = $extractor->getKeywords($fh); my %keywords = $extractor->getKeywords($data);
Extract keywords from an opened filehandle ($fh) or from a buffer in memory ($data). Returns a hash with all the extracted keywords. The hash keys represent the keywords type, the hash values are the actual keywords.
$fh
$data
Florian Ragwitz, <rafl at debian.org>
<rafl at debian.org>
Please report any bugs or feature requests to bug-file-extractor at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=File-Extractor. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-file-extractor at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc File::Extractor
You can also look for information at:
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/File-Extractor
CPAN Ratings
http://cpanratings.perl.org/d/File-Extractor
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=File-Extractor
Search CPAN
http://search.cpan.org/dist/File-Extractor
Copyright 2007-2009 Florian Ragwitz, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install File::Extractor, copy and paste the appropriate command in to your terminal.
cpanm
cpanm File::Extractor
CPAN shell
perl -MCPAN -e shell install File::Extractor
For more information on module installation, please visit the detailed CPAN module installation guide.