The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Fuse::PDF - Filesystem embedded in a PDF document

SYNOPSIS

    use Fuse::PDF;
    my $fs = Fuse::PDF->new('my_doc.pdf');
    $fs->mount('/mnt/pdf');
    # blocks until the filesystem is unmounted

See also the mount_pdf front-end.

LICENSE

Copyright 2007-2008 Chris Dolan, cdolan@cpan.org

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

DESCRIPTION

The Adobe Portable Document Format is an arbitrary collection of nodes which support a tree structure. Most of that data is oriented toward document rendering, but it's legal to add arbitrarily complex data virtually anywhere in the document structure. Adobe Illustrator does this to embed lots of metadata in it's "PDF-compatible" Illustrator document format.

By deciding on a convention for representing a filesystem data and leveraging the FUSE (Filesystem in Userspace) library, we map filesystem calls to PDF edits.

More info: http://www.chrisdolan.net/madmongers/par-fuse-pdf.html

BUGS AND CAVEATS

PDF-in-PDF: If you copy another PDF into the PDF-based filesystem, it may corrupt the outer document. This should be solved when I switch to saving file contents in PDF streams instead of in PDF strings.

Saving: No data is saved until you unmount the filesystem! Hopefully I can fix this in future releases. The saving is not yet atomic. That is, if you have a failure, the old PDF may be deleted before the new one is saved.

Resources: The entire PDF is loaded into RAM in new(). If your filesystem grows too large, this will lead to obvious problems!

Hangs: While FUSE is quite mature, I found it to be fairly easy to hang the filesystem back around 0.01. I only needed to actually reboot once, but if that causes you concern you may wish to avoid FUSE in general. This has not been a problem since the earliest releases.

Operating systems: I've only tested this software with the Google build of MacFUSE 1.1.0 (PowerPC, 10.4, http://code.google.com/p/macfuse/). Notably, I have not tried the Linux implementation of FUSE. If you have other experiences to add, email me or post comments to http://annocpan.org/.

Fuse.pm: As of this writing, the Fuse module (v0.09_01) fails all tests on Mac. The module actually works great, but the Makefile.PL and the tests are very Linux-centric. Hopefully that will improve as MacFUSE matures.

PDF versions: This package relies on CAM::PDF to read and write PDFs. While that module supports all of the core PDF syntax, it's stricter than many other PDF implementations and may fail to open PDFs that, say, Acrobat or Preview.app can open. In particular, "Print to PDF" on Mac OS X 10.4 often generates bad PDFs.

Threading: I've explicitly set the FUSE default to single-threaded mode, so performance may be terrible in some scenarios. I hope to add support for threaded Perl in a future release. Patches welcome (remove the threaded => 0 line from this file and add locking to Fuse::PDF::FS).

Unsupported: special files (named pipes, etc.), following symlinks out of the filesystem, permission enforcement, chown, flush, reading from unlinked filehandles.

Hard links: I have not yet implemented hard links. I'll implement compressed streams at the same time.

METHODS

$pkg->new($pdf_filename)
$pkg->new($pdf_filename, $hash_of_options)

Create a new filesystem instance. This method opens and parses the PDF document. If there is an error opening or parsing the PDF document, this will return undef.

The options hash supports the following extra arguments:

pdf_constructor

An arrayref of extra arguments to pass to the CAM::PDF constructor. In particular, the first arguments are the owner and user password which can be used to open encrypted PDFs.

save_filename

The string representing the path where filesystem changes should be saved. By default this is the $pdf_filename passed to new().

compact

A boolean indicating whether to discard old filesystem data saved via version infrastructure described in the PDF specification. Defaults to false. If left false, then the PDF will grow with every mount, but only by as much as you changed it. See rewritepdf.pl --cleanse from the CAM::PDF distribution to perform the compaction manually. See also revertpdf.pl to roll back to those older versions.

fs_name

Fuse::PDF can embed multiple filesystems in a single PDF distinguished by name. This string specifies which filesystem to use. It uses the Fuse::PDF::FS default if a name is not explicitly provided.

revision

A version number indicating which filesystem version to roll back to before mounting. Use fs() and the Fuse::PDF::FS API to learn what revisions are available in a PDF filesystem.

$self->mount($mount_path)
$self->mount($mount_path, $hash_of_fuse_options)

Calls into Fuse to mount the filesystem to the specified mount point. On unmount, a new PDF will be saved with any filesystem changes.

If the mount point does not exist, this package will try to create it as a directory via a simple mkdir(). If the mkdir() fails, or if the mount point exists but is not a directory, mount() will croak().

If the PDF has an existing filesystem which is incompatible with this version of the software, mount() will croak().

If the mount is successful, we establish callbacks and hand control to the FUSE library. FUSE blocks until the filesystem is unmounted. If this blocking is a problem for you, consider daemonizing the process like so:

    use Fuse::PDF;
    use Net::Server::Daemonize qw();
    my ($pdffilename, $mountdir) = @ARGV;
    my $fs = Fuse::PDF->new($pdffilename);
    if (Net::Server::Daemonize::safe_fork()) {
        exit 0; # parent process or failure
    }
    Net::Server::Daemonize::daemonize('www', 'www');
    $fs->mount($mountdir);

The mount() method cleans up after itself sufficiently that you may call it again immediately after unmounting.

The options hash is passed directly to Fuse::main(). See the Fuse documentation for the allowed keys. A simple example is:

    $fs->mount($mountdir, {debug => 1});
$self->fs()

Return a fresh copy of the Fuse::PDF::FS data structure representing this PDF. You should not try to manipulate this object while the filesystem is mounted. This module is not yet thread-safe!

SEE ALSO

CAM::PDF

Fuse

mount_pdf

AUTHOR

Chris Dolan, cdolan@cpan.org

CREDITS

Thanks to the Madison Perl Mongers for thinking the idea was stupid enough that I was inspired to implement it!