The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

HTML::Embellish - Typographically enhance HTML trees

VERSION

This document describes version 0.08 of HTML::Embellish, released August 18, 2012.

SYNOPSIS

    use HTML::Embellish;
    use HTML::TreeBuilder;

    my $html = HTML::TreeBuilder->new_from_file(...);
    embellish($html);

DESCRIPTION

HTML::Embellish adds typographical enhancements to HTML text. It converts certain ASCII characters to Unicode characters. It converts quotation marks and apostrophes into curly quotes. It converts hyphens into em-dashes. It inserts non-breaking spaces between the periods of an ellipsis. (It doesn't use the HORIZONTAL ELLIPSIS character (U+2026), because I like more space in my ellipses.)

INTERFACE

embellish($html, ...)

This subroutine (exported by default) is the main entry point. It's a shortcut for HTML::Embellish->new(...)->process($html).

If you're going to process several trees with the same parameters, the object-oriented interface will be slightly more efficient.

$emb = HTML::Embellish->new(flag => value, ...)

This creates an HTML::Embellish object that will perform the specified enhancements. These are the (optional) flags that you can pass:

dashes

If true, converts sequences of hyphens into em-dashes. Two or 3 hyphens become one em-dash. Four hyphens become two em-dashes. Any other sequence of hyphens is not changed.

ellipses

If true, inserts non-breaking spaces between the periods making up an ellipsis. Also converts the space before an ellipsis that appears to end a sentence to a non-breaking space.

hellip

If true, converts the … character to 3 periods. (To insert non-breaking spaces between them, also set ellipses to true.) This defaults to the value of ellipses.

space_ellipses

If true, adds whitespace around ellipses when necessary. This defaults to the value of ellipses.

quotes

If true, converts quotation marks and apostrophes into curly quotes.

default

This is the default value used for flags that you didn't specify. It defaults to 1 (enabled). The main reason for using this flag is to disable any enhancements that might be introduced in future versions of HTML::Embellish.

$emb->process($html)

The process method enhances the content of the HTML::Element you pass in. You can pass the root element to process the entire tree, or any sub-element to process just that part of the tree. The tree is modified in-place; the return value is not meaningful.

DIAGNOSTICS

First parameter of embellish must be an HTML::Element

You didn't pass a valid HTML::Element object to embellish.

HTML::Embellish->process must be passed an HTML::Element

You didn't pass a valid HTML::Element object to embellish.

Odd number of parameters passed to HTML::Embellish->new

HTML::Embellish->new takes parameters in KEY => VALUE style, so there must always be an even number of them.

CONFIGURATION AND ENVIRONMENT

HTML::Embellish requires no configuration files or environment variables.

DEPENDENCIES

Requires the HTML::Tree distribution from CPAN (or some other module that implements the HTML::Element interface). Versions of HTML::Tree prior to 3.21 had some bugs involving Unicode characters and non-breaking spaces.

INCOMPATIBILITIES

None reported.

BUGS AND LIMITATIONS

I've experienced occasional segfaults when using this module with Perl 5.8.8. Since a pure-Perl module like this shouldn't be able to cause a segfault, I believe the issue is with Perl 5.8. I recommend using Perl 5.10 if at all possible, as the files that segfaulted under 5.8.8 worked fine with 5.10.

AUTHOR

Christopher J. Madsen <perl AT cjmweb.net>

Please report any bugs or feature requests to <bug-HTML-Embellish AT rt.cpan.org> or through the web interface at http://rt.cpan.org/Public/Bug/Report.html?Queue=HTML-Embellish.

You can follow or contribute to HTML-Embellish's development at http://github.com/madsen/html-embellish.

COPYRIGHT AND LICENSE

This software is copyright (c) 2012 by Christopher J. Madsen.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENSE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.