The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::Highlight - Syntax highlighting framework

SYNOPSIS

   use Text::Highlight 'preload';
   my $th = new Text::Highlight(wrapper => "<pre>%s</pre>\n");
   print $th->highlight('Perl', $code);

DESCRIPTION

Text::Highlight is a flexible and extensible tool for highlighting the syntax in programming code. The markup used and languages supported are completely customizable. It can output highlighted code for embedding in HTML, terminal escapes for an ANSI-capable display, or even posting on an online forum. Bundled support includes C/C++, CSS, HTML, Java, Perl, PHP and SQL.

INSTALLATION

In order to install and use this package you will need Perl version 5.005 or better.

Installation as usual:

   % perl Makefile.PL
   % make
   % make test
   % su
     Password: *******
   % make install

DEPENDENCIES

No thirdy-part modules are required.

Following modules are optional

HTML::SyntaxHighlighter and HTML::Parser (in order to have better highlighting HTML code)
Term::ANSIColor (if you want terminal escapes)

API OVERVIEW

[Todo]

METHODS

Text::Highlight provides an object oriented interface described in this section. Optionally, new can take the same parameters as the configure method described below.

    my $th = new Text::Highlight( %args )

Public Methods:

$th->configure( %args )

    Sets the method used to output highlighted code. Any combination of the following properties can be passed at once. If any option is invalid (such as the wrapper containing no %s), a note of such is clucked out to STDERR and is otherwise silently ignored.

    wrapper => '<pre>%s</pre>'

      An sprintf-style format string that the entire code is passed through when it's completed. It must include a single %s and any other optional formatting. If you do not want any wrapper, just the highlighted code, set this to a simple '%s'. Also, be aware that since this is an sprintf format, you must be careful of other % characters in the format. Include only a single '%s' in the format for the highlighted code. Refer to "sprintf" in perlfunc.

    markup => '<span class="%s">%s</span>'

      Another sprintf format string, this one's for the markup of individual semantic pieces of the highlighted code. In short, it's what makes a comment turn green. The format contains two '%s' strings. The first is the markup identifier from the colors hash for the type of snippet that's being marked up. The second is the actual snippet being marked up. A comment may look like <span class="comment">#me comment</span> as final output.

      The limitation of this is that the identifier for the type must come before the code itself. Normally, this is the way markup works, but if you have something that won't, you're out of luck for the immediate time being. Future versions may include support for setting a coderef to get around it.

    colors => \%hash

      The default colors hash is:

        { comment => 'comment',
          string  => 'string',
          number  => 'number',
          key1    => 'key1',
          key2    => 'key2',
          key3    => 'key3',
          key4    => 'key4',
          key5    => 'key5',
          key6    => 'key6',
          key7    => 'key7',
          key8    => 'key8',
        };

      This is the name to semantic markup token mapping hash. The parser breaks up code into semantic chunks denoted by the name keys. What gets passed through the above markup's format is the value set at each key. This can hold things like raw color values, ANSI terminal escapes, or, the default, CSS classes.

    escape => \&escape_sub | 'default' | undef

      Every bit of displayed code is passed through an escape function customizable for the output medium. $escaped_string = escapeHTML("unescaped string") If set to a code reference, it will be called for every piece of code. This gets called a lot, so if you're concerned with performance, take care that the function is pretty lightweight.

      The default function does a minimal HTML escape, only the three & < and > characters are escaped. If you desire a more robust HTML escape, it has the same prototype as HTML::Entity's encode_entities() and CGI's escapeHTML(). If you change the escape routine and want to change it back to the default, just set it to the literal string 'default'.

      A third option is no escaping at all and can be set by passing undef.

    vb => 1, tgml => 1, ansi => 1

      When true, it sets the format, wrapper, escape, and colors to that of the specified markup. When vb is true, it sets values for posting in vBulletin. For tgml it's good at Tek-Tips. For ansi it's good for display in a terminal that accepts ANSI color escapes.

      Note, if more than one of these is present in a given call to configure, it is indeterminite as to which one gets set. Also, if wrapper, markup, colors, or escape is passed along with vb, tgml, or ansi, it does not get overwritten. Hence, $th->configure(wrapper => '[tt]%s[/tt]', tgml => 1) will set the stored TGML settings for markups, colors, and escape, but will use the custom wrapper passed in instead of the value stored for TGML.

$code = $th->highlight($type, $code, $options)

$code = $th->highlight(type => $type, code => $code, options => $options)

    The highlight method is the one that does all the work. Given at least the type and original code, it will mark-up and return a string with the highlighted code. It takes named parameters as listed below, or just their values as a flat array in the order listed below. Order is subject to change, so you're probably safer using the hash syntax.

    type => $type

      The type passed in is the name of the type of code. This can either be a type loaded from get_syntax or is the name of a sub-module that has a syntax or highlight method, ie Text::Highlight::$type.

    code => $code

      code is the unmarked-up, unescaped, plain-text code that needs to be highlighted.

    options => $options

      options is optional and mostly not needed. Some parsing modules can take extra configuration options, so what options is can vary greatly. Could be a string, a number, or a hashref of many options. The only standard is if it is set to the string 'simple' in which case the highlight method of the syntax module is not called and Text::Highlight's local parsing method is used with the syntax module's syntax hash.

$code = $th->output

    Returns the highlighted code from the last time the highlight method was called.

$th->get_syntax($type, $grammar, $format, $force)

$th->get_syntax(type => $type, grammar => $grammar, format => $format, force => $force)

    In addition to the existing T::H:: sub-modules, you can specify new ones at runtime via text editor syntax files. Current support is for EditPlus and UltraEdit (both very good text/code editors). Many users make these files available on the web and shouldn't be difficult to find. This method can also be used to load an already parsed language syntax hash if, for whatever reason, you don't want to make them into modules.

    This method returns a hashref to the parsed syntax if successful, or undef and a clucked error message if not. You can use the returned value as a simple truth test, or you can make your own static sub-module out of it and save reparsing time if you're using the same additional types often. See <a doc that doesn't yet exists> for details on creating a sub-module. The object keeps a copy of the new type and can be referenced in the highlight method for the object's life.

    type => $type

      The type is the same that gets passed to highlight, so whatever is specified here must match the call there for use. Also, if the same type is specified as one that already exits as a sub-module (visible in @INC as Text::Highlight::$type), the syntax loaded via get_syntax will take precedence.

    grammar => $filename | \%syntax

      grammar can be one of two things: the filename containing the syntax, or a hashref to an already parsed language syntax. If a filename, the file must contain only a single language syntax definition. Though some editors allow multiple language defined in the same file, to be loaded here, it may contain only one. If a hashref, it is assumed to be valid and no further checking is done.

    format => 'editplus' | 'ultraedit'

      format is a string specifying which format the syntax definition in the file is in. It is not used if grammar is a hashref, but is required if it is a filename. Currently, it must be set to one of the following strings: 'editplus' 'ultraedit'

      The syntax for a language is set to the following default hash before parsing the file. This means if any of the options are not set in the syntax file, the default specified here is used instead. If format is not set to a valid string, this default hash is also set and passed back instead of throwing an error. It will allow parsing to happen without error, but will not do anything to the code.

        { name => 'Unknown-type',
          escape => '\\',
          case => 1,
          continueQuote => 0,
          blockCommentOn => [],
          lineComment => [],
          quot => [],
        };

    force => 1

      If force is set to a true value, the grammar specified will always be reparsed, reset, and reloaded. By default, if a grammar is loaded for a type that has already been loaded, the existing copy is used instead and no reparsing is done. This works as a very simple cacheing mechanism so you don't have to worry about unneccessary processing unless you want to.

Examples:

Until I come up with some better examples, here's the defaults the module uses.

      $DEF_FORMAT   = '<span class="%s">%s</span>';
      $DEF_ESCAPE   = \&_simple_html_escape;
      $DEF_WRAPPER  = '<pre>%s</pre>';
      $DEF_COLORS   = { comment => 'comment',
                        string  => 'string',
                        number  => 'number',
                        key1    => 'key1',
                        key2    => 'key2',
                        key3    => 'key3',
                        key4    => 'key4',
                        key5    => 'key5',
                        key6    => 'key6',
                        key7    => 'key7',
                        key8    => 'key8',
      };
                                    
      #sub is the same prototype as CGI.pm's escapeHTML()
      #and HTML::Entity's encode_entities()
      sub _simple_html_escape
      {
          my $code = shift;
            
          #escape the only three characters that "really" matter for displaying html
          $code =~ s/&/&amp;/g;
          $code =~ s/</&lt;/g;
          $code =~ s/>/&gt;/g;
      
          return $code;
      }

API SYNTAX EXTENSIONS

[Todo]

EXAMPLES

[Todo]

TODO

  • Finish documentation (especially a "how do I make a custom highlighting module" kind of thing)

  • Let wrapper and format take coderefs instead of just sprintf format strings

  • Add support for get_syntax to take a file handle

  • Add support for a force case option for case-insensitive languages (upper, lower, or match stored)

  • Write T::H:: wrappers for the modules in the Syntax:: namespace

  • Test, test ,test ;-)

AUTHORS

Andrew Flerchinger <icrf [at] wdinc.org>

Enrico Sorcinelli <enrico [at] sorcinelli.it> (main contributors)

BUGS

Please submit bugs to CPAN RT system at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Highlight or by email at bug-text-highlight@rt.cpan.org

Patches are welcome and we'll update the module if any problems are found.

VERSION

Version 0.04

SEE ALSO

HTML::SyntaxHighlighter, perl(1)

COPYRIGHT AND LICENSE

Copyright (C) 2001-2005. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.