The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

HTML::HTMLDoc - Perl interface to the htmldoc program for producing PDF-Files from HTML-Content

SYNOPSIS

  use HTML::HTMLDoc;

  my $htmldoc = new HTML::HTMLDoc();

  $htmldoc->set_html_content(qq~<html><body>A PDF file</body></html>~);
  # $htmldoc->set_input_file($filename); # alternative to use a present file from your fs

  my $pdf = $htmldoc->generate_pdf();

  print $pdf->to_string();
  $pdf->to_file('foo.pdf');

DESCRIPTION

This Module provides an OO-interface to the htmldoc programm.

You can use it to produce PDF or PS files from a HTML-document. Currently many but not all parameters of HTMLDoc are supported.

You need to have HTMLDoc installed before installing this module.

All the pdf-Methods return true for success or false for failure. You can test if errors occurred by calling the error-method.

Normaly this module uses IPC::Open3 for communacation with the HTMLDOC process. However, in mod_perl-environments there appear problems with this module because the standard-output can not be captured. For this problem this module provides a fix doing the communication in file-mode.

For this you can specify the parameter mode in the constructor: my $htmldoc = new HTMLDoc('mode'=>'file', 'tmpdir'=>'/tmp');

METHODS

new()

creates a new Instance of HTML::HTMLDoc.

Optional parameters are: mode=>['file'|'ipc'] defaults to ipc tmpdir=>$dir defaults to /tmp

The tmpdir is used for temporary html-files in filemode. Remember to set the file-permissions to write for the executing process.

set_page_size($size)

sets the desired size of the pages in the resulting PDF-document. $size is one of:

  • a4 (default)

  • letter

  • WxH{in,cm,mm} eg '10x10cm'

set_owner_password($password)

sets the owner-password for this document. $password can be any string. This only has effect if encryption is enabled. see enable_encryption().

set_user_password($password)

sets the user-password for this document. $password can be any string. If set, User will be asked for this password when opening the file. This only has effect if encryption is enabled, see enable_encryption().

set_permissions($perm)

sets the permissions the user has to this document. $perm can be:

  • all

  • annotate

  • copy

  • modify

  • print

  • no-annotate

  • no-copy

  • no-modify

  • no-print

  • none

    setting one of this flags automatically enables the document-encryption ($htmldoc->enable_encryption()) for you, because setting permissions will have no effect without it.

    Setting 'all' and 'none' will delete all other previously set options. You can set multiple options if you need, eg.:

    $htmldoc->set_permissions('no-copy'); $htmldoc->set_permissions('no-modify');

    this one will do the same: $htmldoc->set_permissions('no-copy', 'no-modify');

links()

turns link processing on.

no_links()

turns the links off.

path()

specify the search path for files in a document

landscape()

sets the format of the resulting pages to landscape

portrait()

sets the format of the resulting pages to portrait

title()

turns the title on.

no_title()

turns the title off.

set_right_margin($margin, $messure)

set the right margin. $margin is a INT, $messure one of 'in', 'cm' or 'mm'.

set_left_margin($margin, $messure)

set the left margin. $margin is a INT, $messure one of 'in', 'cm' or 'mm'.

set_bottom_margin($margin, $messure)

set the bottom margin. $margin is a INT, $messure one of 'in', 'cm' or 'mm'.

set_top_margin($margin, $messure)

set the top margin. $margin is a INT, $messure one of 'in', 'cm' or 'mm'.

set_bodycolor($color)

Sets the background of all pages to this background color. $color is a hex-coded color-value (eg. #FFFFFF), a rgb-value (eg set_bodycolor(0,0,0) for black) or a color name (eg. black)

set_bodyfont($font)

Sets the default font of the content. Currently the following fonts are supported:

Arial Courier Helvetica Monospace Sans-Serif Serif Symbol Times

set_fontsize($fsize)

Sets the default font size for the body text.

set_bodyimage($image)

Sets the background image for the document. $image is the path to the image in your filesystem.

set_browserwidth($width)

specifies the browser width in pixels. The browser width is used to scale images and pixel measurements when generating PostScript and PDF files. It does not affect the font size of text.

The default browser width is 680 pixels which corresponds roughly to a 96 DPI display. Please note that your images and table sizes are equal to or smaller than the browser width, or your output will overlap or truncate in places.

set_compression($level)

specifies that Flate compression should be performed on the output file. The optional level parameter is a number from 1 (fastest and least amount of compression) to 9 (slowest and most amount of compression).

This option is only available when generating Level 3 PostScript or PDF files.

set_pagemode($mode)

specifies the initial viewing mode of the document. $mode is one of:

  • document - the document pages are displayed in a normal window

  • outline - the document outline and pages are displayed

  • fullscreen - the document pages are displayed on the entire screen

set_charset($charset)

defines the charset for the output document. The following charsets are currenty supported: cp-874 cp-1250 cp-1251 cp-1252 cp-1253 cp-1254 cp-1255 cp-1256 cp-1257 cp-1258 iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-14 iso-8859-15 koi8-r

color_on()

defines that color output is desired

color_off()

defines that b&w output is desired

enable_encryption()

enables encryption and security features for the document.

disable_encryption()

enables encryption and security features for the document.

set_output_format($format)

sets the format of the output-document. $format can be one of:

  • html

  • pdf (default)

  • pdf11

  • pdf12

  • pdf13

  • pdf14

  • ps

  • ps1

  • ps2

  • ps3

set_html_content($html)

this is the function to set the html-content as a scalar. See set_input_file($filename) to use a present file from your filesystem for input

get_html_content()

gives back the previous set html-content.

set_input_file($input_filename)

this is the function to set the input file name. It will also switch the operational mode to 'file'.

get_input_file()

gives back the previous set input file name.

set_header($left, $center, $right)

defines the data that should be displayed in header. One can choose from the following chars for each left, center and right:

  • . A period indicates that the field should be blank.

  • : A colon indicates that the field should contain the current and total number of pages in the chapter (n/N).

  • / A slash indicates that the field should contain the current and total number of pages (n/N).

  • 1 The number 1 indicates that the field should contain the current page number in decimal format (1, 2, 3, ...)

  • a A lowercase "a" indicates that the field should contain the current page number using lowercase letters.

  • A An uppercase "A" indicates that the field should contain the current page number using UPPERCASE letters.

  • c A lowercase "c" indicates that the field should contain the current chapter title.

  • C An uppercase "C" indicates that the field should contain the current chapter page number.

  • d A lowercase "d" indicates that the field should contain the current date.

  • D An uppercase "D" indicates that the field should contain the current date and time.

  • h An "h" indicates that the field should contain the current heading.

  • i A lowercase "i" indicates that the field should contain the current page number in lowercase roman numerals (i, ii, iii, ...)

  • I An uppercase "I" indicates that the field should contain the current page number in uppercase roman numerals (I, II, III, ...)

  • l A lowercase "l" indicates that the field should contain the logo image.

  • t A lowercase "t" indicates that the field should contain the document title.

  • T An uppercase "T" indicates that the field should contain the current time.

Example:

Setting the header to contain the title left, nothing in center and actual pagenumber right do the follwing

$htmldoc->set_header('t', '.', '1');

set_footer($left, $center, $right)

defines the data that should be displayed in footer. See set_header() for details setting the left, center and right value.

embed_fonts()

specifies that fonts should be embedded in PostScript and PDF output. This is especially useful when generating documents in character sets other than ISO-8859-1.

no_embed_fonts()

turn the font-embedding previously enabled by embed_fonts() off.

generate_pdf()

generates the output-document. Returns a instance of HTML::HTMLDoc::PDF. See the perldoc of this class for details

error()

in scalar content returns the last error that occurred, in list context returns all errors that occurred.

EXPORT

None by default.

AUTHOR

Michael Frankl - mfrankl@seibert-media.de

COPYRIGHT AND LICENCE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself

CREDITS

Thanks very much to:

Rajat Bhatia

Keith W. Sheffield

Christoffer Landtman

for suggestions and bug fixes.

SEE ALSO

perl.

HTML::HTMLDoc::PDF.