Win32::UrlCache - parse Internet Explorer's history/cache/cookies
use Win32::UrlCache; my $index = Win32::UrlCache->new( 'index.dat' ); foreach my $url ( $index->urls ) { print $url->url, "\n"; } Or, you can use callback function if you care memory usage. use Win32::UrlCache; my $index = Win32::UrlCache->new( 'index.dat' ); $index->urls( callback => \&callback ) sub callback { my $entry = shift; my $url = $entry->url; $url =~ s/^Visited: //; $entry->url( $url ); print $entry->url, "\n"; return; # to prevent the entry from being kept in the object } If you want to know the title of the cached page (for Win32 only): use Win32::UrlCache::Cache; use Win32::UrlCache::Title; use Encode; my $cache = Win32::UrlCache::Cache->new; $cache->urls( callback => \&callback ) sub callback { my $entry = shift; print $entry->url, "\n"; my $title = Win32::UrlCache::Title->extract( $entry->filename ); print encode( shiftjis => $title ), "\n\n" if $title; return; }
This parses so-called "Client UrlCache MMF Ver 5.2" index.dat files, which are used to store Internet Explorer's history, cache, and cookies. As of writing this, I've only tested on Win2K + IE 6.0, but I hope this also works with some of the other versions of OS/Internet Explorer. However, note that this is not based on the official/public MSDN specification, but on a hack on the web. So, caveat emptor in every sense, especially for the redr entries ;)
Patches and feedbacks are welcome.
receives a path to an 'index.dat', and parses it to create an object.
returns URL entries in the 'index.dat' file. Each entry has url, filename, headers, filesize, last_modified, last_accessed, and optionally, title accessors (note that some of them would return meaningless values). As of 0.02, it can receive a callback function. See below. As of 0.04, you can also pass ( extract_title => 1 ) to extract title. However, this extraction is processed after a callback. So, if you want both to use a callback and to extract title, you might want to insert extraction code into the callback as shown in the synopsis.
almost the same as urls, but returns LEAK entries (if any) in the 'index.dat' file.
returns REDR entries (if any) in the 'index.dat' file. Each entry has a url accessor. As of 0.02, it can receive a callback function.
Three methods shown above return all the entries found in the index by default, but this may eat lots of memory especially if you use IE as a main browser. As of 0.02, those methods may receive a callback function, which will take an entry for the first (and only, as of writing this) argument. If the callback returns true, the entry will be stored in the ::UrlCache object, and if the callback returns false, the entry will be discarded after the callback is executed.
http://www.latenighthacking.com/projects/2003/reIndexDat/
Kenichi Ishigaki, <ishigaki at cpan.org>
Copyright (C) 2007 by Kenichi Ishigaki.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Win32::UrlCache, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Win32::UrlCache
CPAN shell
perl -MCPAN -e shell install Win32::UrlCache
For more information on module installation, please visit the detailed CPAN module installation guide.