WWW::Scraper::Typo3 - Clean up files managed by the CMS called Typo3
WWW::Scraper::Typo3
Note: The code assumes you are running a web server locally, so the scripts can both read and write files, and use LWP::Simple::getstore to process files.
cd ~/misc wget -o wget.log --limit-rate=100k -w 4 -r -k -P tewoaf -E -p http://tewoaf.org.au cd tewoaf rm *eID* # This removes pop-up files generated by clicking on images. cd $DR # This is doc root for your web server. rm -rf tewoaf cp -r ~/misc/tewoaf cd ~/perl.modules/WWW-Scraper-Typo3 perl scripts/rename.files.pl -d $DR/tewoaf -v 1 perl scripts/patch.files.pl -d $DR/tewoaf -v 1 perl scripts/report.files.pl -b /tewoaf -v 1
patch.files.pl is the only program which overwrites files.
WWW::Scraper::Typo3 is a pure Perl module.
It processes the set of files downloaded from a web site whose files are managed by the CMS called Typo3.
This module is available as a Unix-style distro (*.tgz).
See http://savage.net.au/Perl-modules.html for details.
See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing.
new(...) returns an object of type WWW::Scraper::Typo3.
This is the class's contructor.
Usage: WWW::Scraper::Typo3 -> new().
WWW::Scraper::Typo3 -> new()
This method takes a hash of options.
Call new() as new(option_1 => value_1, option_2 => value_2, ...).
new()
new(option_1 => value_1, option_2 => value_2, ...)
Available options:
The script report.files.pl uses "http://$host:$port$base_url$home_page" as the URL where processing starts.
If necessary, both a leading '/' and a trailing '/' are added to the value you supply.
The default value is '/'.
This parameter is mandatory for the script report.files.pl.
This option is used by the 2 scripts rename.files.pl and patch.files.pl.
It is the directory where these scripts read and write files.
From the synopsis, you can see I suggest you download the site's files to a directory outside your local web server's doc root, and work on a copy of the files within that doc root.
The default value is ''.
This parameter is optional.
The name of the home page of the site.
The default value is index.html.
The domain name or IP address of the host.
The default value is 127.0.0.1.
The number of the port to use.
The default value is 80.
Display more (1) or less (0) output.
The default is 0.
Run the code which patches various aspects of Typo3-managed files.
See scripts/patch.files.pl.
Run the code which renames Typo3-managed files.
See scripts/rename.files.pl.
Run the code which reports on various aspects of Typo3-managed files.
See scripts/report.files.pl.
WWW::Scraper::Typo3 was written by Ron Savage <ron@savage.net.au> in 2010.
Home page: http://savage.net.au/index.html
Australian copyright (c) 20010 Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Artistic License, a copy of which is available at: http://www.opensource.org/licenses/index.html
To install WWW::Scraper::Typo3, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::Scraper::Typo3
CPAN shell
perl -MCPAN -e shell install WWW::Scraper::Typo3
For more information on module installation, please visit the detailed CPAN module installation guide.