The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

PerlPoint::Tags - processes PerlPoint tag declarations

VERSION

This manual describes version 0.05.

SYNOPSIS

  # declare a tag declaration package
  package PerlPoint::Tags::New;

  # declare base "class"
  use base qw(PerlPoint::Tags);

DESCRIPTION

PerlPoint is built a modularized way. The base packages provide parsing and stream processing for all translators into target formats and are therefore intended to be as general as possible. That's why they not even define tags, because every translator author may wish to provide own tags special to the addressed target projector (or format, respectively). On the other hand, the parser needs to know about tags to recognize them correctly. That is where this module comes in. It serves as a base of tag declaration modules by providing a general import() method to be inherited by them. This method scans the invoking module for certain data structures containing tag declarations and imports these data into a structure in its own namespace. The parser knows about this PerlPoint::Tags collection and makes it the base of its tag handling.

It is recommended to have a "top level" tag declaration module for each PerlPoint translator, so there could be a PerlPoint::Tags::HTML, a PerlPoint::Tags::Latex, PerlPoint::Tags::SDF, a PerlPoint::Tags::XML and so on. (These modules of course may simply invoke lower level declarations.)

Note: We are speaking in terms of "classes" here but of course we are actually only using the mechanism of import() together with inheritance to provide an intuitive and easy to use way of declaration.

As an additional feature, the module provides a method addTagSets() to allow translator users to declare tags additionally. See below for details.

Tag declaration by subclasses

So to declare tags, just write a module in the PerlPoint::Tags namespace and make it a subclass of PerlPoint::Tags:

  # declare a tag declaration package
  package PerlPoint::Tags::New;

  # declare base "class"
  use base qw(PerlPoint::Tags);

Now the tags can be declared. Tag declarations are expected in a global hash named %tags. Each key is the name of a tag, while descriptions are nested structures stored as values.

  # pragmata
  use strict;
  use vars qw(%tags %sets);

  # tag declarations
  %tags=(
         EMPHASIZE => {
                       # options
                       options => TAGS_OPTIONAL,

                       # don't miss the body!
                       body    => TAGS_MANDATORY,
                      },

         COLORIZE => {...},

         FONTIFY  => {...},

         ...
        );

This looks complicated but is easy to understand. Each option is decribed by a hash. The body slot just expresses if the body is obsolete, optional or mandatory. This is done by using constants provided by PerlPoint::Constants. Obsolete bodies will not be recognized by the parser.

The body slot may be omitted. This means the body is optional.

There are the same choices for options in general: they may be obsolete, optional or mandatory. If the slot is omitted this means that the tag does not need any options. The parser will not accept a tag option part in this case.

To sum it up, options and body of a tag can be declared as mandatory by TAGS_MANDATORY, optional by TAGS_OPTIONAL, or obsolete by TAGS_DISABLED.

If you need further checks you can hook into the parser by using the "hook" key:

  %tags=(
         EMPHASIZE => {
                       # options
                       options => TAGS_OPTIONAL,

                       # perform special checks
                       hook => sub {
                                    # get parameters
                                    my ($tagname, $options, $body, $anchor)=@_;

                                    # checks
                                    $rc=...

                                    reply results
                                    $rc;
                                   }
                      },

         COLORIZE => {...},

         FONTIFY  => {...},

         ...
        );

An option hook function receives the tag name, a reference to a hash of option name / value pairs to check, a body array reference and an anchor object. Using the option hash reference, the hook can modify the options. The passed body array is a copy of the body part of the stream. The hook therefore cannot modify the body part on parsers side. The anchor object can be used to store new anchors or query anchors already known, see \PerlPoint::Anchors for details of this objects interface.

The following return codes are defined:

PARSING_COMPLETED

Parsing will be stopped successfully.

PARSING_ERASE

The parser will throw away the tag and all its content.

PARSING_ERROR

A semantic error occurred. This error will be counted, but parsing will be continued to possibly detect even more errors.

PARSING_FAILED

A syntactic error occured. Parsing will be stopped immediately.

PARSING_IGNORE

The parser will ignore the tag, but stream the body. The result is similar to a source omitting the tag.

PARSING_OK

The checked object is declared to be OK, parsing will be continued.

Hooks are an interesting way to extend document parsing, but please take into consideration that tag hooks might be called quite often. So, if checks have to be performed, users will be glad if they are performed quickly.

And there is another hook interface. It might happen that several operations need parsing to be completed before they can start, like checking an referenced anchor which might be defined after the referencing tag. To handle such situations, a subroutine can be declared as value of key finish. The parser will invoke this code when parsing is done and the tag was parsed successfully. (So if a hook function returned an error code, the finish hook will be ignored.)

Here is an example (from an implementation of the basic tag \REF):

  # afterburner
  finish =>  sub
              {
               # declare and init variable
               my $ok=PARSING_OK;

               # take parameters
               my ($options, $anchors)=@_;

               # check link for being valid
               unless (my $anchor=$anchors->query($options->{name}))
                 {
                  $ok=PARSING_FAILED,
                  warn qq(\n\n[Error] Unknown link address "$options->{name}."\n);
                 }
               else
                 {
                  # link ok, get value (there is only one key/value pair
                  # in the received hash)
                  ($options->{__value__})=(values %$anchor);
                 }

               # supply status
               $ok;
              },

Because several informations are no longer available after parsing, finish hooks have a different interface. They receive options and anchors like parsing hooks, but no line number and no body information.

Options can be modified as well as in parsing hooks. Return codes are the same, but are evaluated slightly different according to the invokation time:

PARSING_COMPLETED

All right. This code is accepted for reasons of convenience, it is recommended to use PARSING_OK instead.

PARSING_ERASE

The backend will ignore the tag and all its contents (which means its body).

PARSING_ERROR

A semantic error occurred. This error will be counted.

PARSING_FAILED

An error occured. Because parsing is already finished, this will be counted as an sematic error.

This code is accepted for reasons of convenience, it is recommended to use PARSING_ERROR instead.

PARSING_IGNORE

The backend will ignore the tag, but process its body. This simply means that the tag takes no effect.

PARSING_OK

All right.

The order of finish hook invokation can differ from the order of tag usage. Do not depend on it.

A finish hook is not invoked unless the tag was processed and streamed successfully at parsing time. This simply means if the parsing hook returned PARSING_OK, or if there was no parsing hook at all.

Marking tags that can act standalone

A tag can be part of various paragraphs. A single tag in a paragraph with no prefix produces a text paragraph containing just this tag. This can be intended, but there are other cases when the tag should stand for its own.

The standalone attribute instructs the parser to strip off the wrapping paragraph from a handle that is used as its only content. If there is more content in the paragraph the paragraph wrapper will not be removed.

The flag should be set to a true value to activate the occasional paragraph stripping.

Example:

  standalone => 1,

Using other tag definitions

One can invoke hooks of any other registered tag. This is powerful, but dangerous. Nevertheless, it helps to emulate other tags, for example if an old interface (tag and option names) shall be preserved but the new functionality shall be used (without being copied between tag modules). To invoke a foreign hook, call \PerlPoint::Tags::call() (fully qualified function name) with tag name, hook type and parameters, like so:

 $rc=PerlPoint::Tags::call('TAG', 'hook', @_);

Valid hook types are "hook" and "finish" (currently). If the tag is not registered, or has no hook of the specified type, an undefined value is supplied, otherwise the return code of the invoked function.

It is not checked if you call a "hook" function from a "finish" hook or vice versa. Take care!

This feature is made available to interchange hooks between several tag definition modules. If you want to share hook functions between tags declared by the same tag module, it is recommended to use common Perl techniques.

Tag activation

Now, in a translator software where a parser object should be built, tag declarations can be accessed by simply loading the declaration modules, just as usual (there is no need to load PerlPoint::Tags directly there, unless the converter should run under perl 5.005 which needs this parent module to be loaded explicitly (while perl 5.6 does is implicitly)):

  # declare all the tags to recognize
  use PerlPoint::Tags::New;

This updates a structure in the PerlPoint::Tags namespace. The parser knows about this structure and will automatically evaluate it.

Several declaration modules can be loaded subsequently. Each new tag is added to the internal structure, while predeclared tags are overwritten by new declarations.

  # declare all the tags to recognize
  use PerlPoint::Tags::Basic;
  use PerlPoint::Tags::HTML;
  use PerlPoint::Tags::SDF;
  use PerlPoint::Tags::New;

Activating certain tags

Certain translators might only want to support subsets of tags declared in a PerlPoint::Parser submodule. This is possible as well, similar to the usual importing mechanism:

  # declare all the tags to recognize
  use PerlPoint::Tags::New qw(COLORIZE);

This does only declare the COLORIZE tag, but ignores EMPHASIZE and FONTIFY.

Tag sets

To simplify activation of certain but numerous tags a declaration module can group them by setting up a global hash named %sets.

  %sets=(
         pointto => [qw(EMPHASIZE COLORIZE)],
        );

This allows a translator autor to activate EMPHASIZE and COLORIZE at once:

  # declare all the tags to recognize
  use PerlPoint::Tags::New qw(:pointto);

The syntax is borrowed from the usual import mechanism.

Tag sets can overlap:

  %sets=(
         pointto => [qw(EMPHASIZE COLORIZE)],
         set2    => [qw(COLORIZE FONTIFY)],
        );

And of course they can be nested:

  %sets=(
         pointto => [qw(EMPHASIZE COLORIZE)],
         all     => [(':pointto', qw(FONTIFY))],
        );

Allowing translator users to import foreign tag declarations

As PerlPoint provides a flexible way to write translators, PerlPoint documents might be written with tags for a certain translator and later then be processed by another translator which does not support all the original tags. Of course, the second translator does not need to handle these tags, but the parser needs to know they should be recognized. On the other hand, it cannot know this from the declarations made by the second translator itself, because they of course do not contain the tags of the first translator.

The problem could be solved if there would be a way to inform the parser about the tags initially used. That's why this module provides addTagSets(), a method that imports foreign declarations at run time. Suppose a translator provides an option -tagset to let a user specify which tag sets the document was initially written for. Then the following code makes them known to the parser, addtionally to the declarations the translator itself already made as usual (see above):

  # load module to access the function
  use PerlPoint::Tags;

  # get options
  ...

  # import foreign tags
  PerlPoint::Tags::addTagSets(@{$options{tagset}})
    if exists $options{tagset};

(Note: this example is based on the Getopt::Long option access interface. Other interfaces might need adaptations.)

Tags imported via addTagSets() do not overwrite original definitions.

A "tag set", in this context, is the set of tag declarations a certain translator makes. So, the attributes to addTagSets() are expected to be target languages corresponding to the translators name, making usage easy for the user. So, pp2sdf is expected to provide a "tag set" declaration module PerlPoint::Tags::SDF, pp2html PerlPoint::Tags::HTML, pp2xml PerlPoint::Tags::XML and so on.

If all translators provide this same interface, usage should be easy. A user who wrote a document with pp2html in mind, passing it to pp2sdf which provides significantly less tags, only has to add the option "-tagset HTML" to the pp2sdf call to make his document pass the PerlPoint parser.

METHODS

addTagSets()

Imports tagsets. See "Allowing translator users to import foreign tag declarations" for details.

call()

Calls a hook function of a registered tag. See "Using other tag definitions" for details.

NOTES

The form of tag declaration provided by this module is designed to make tag activation intuitive to write and read. Ideally, declarations are written by one author, but used by several others.

Each tag declaration module should provide a tag description in PerlPoint. This allows translator authors to easily integrate tag descriptions into their own documentations.

Tag declarations have nothing to do with the way backends (translators) handle recognized tags. They only enable tag detection and a few simple semantic checks by the parser. A translator has still to implement its tag handling itself.

There are no tag namespaces. Although Perl modules are used to declare the tags, tags declared by various PerlPoint::Tags::Xyz share the same one global scope. This means that different tags should be named different. PerlPoint::Tags displays a warning if a tag is overwritten by another one.

SEE ALSO

PerlPoint::Parser

The parser module working on base of the declarations.

PerlPoint::Tags::xyz

Various declaration modules.

SUPPORT

A PerlPoint mailing list is set up to discuss usage, ideas, bugs, suggestions and translator development. To subscribe, please send an empty message to perlpoint-subscribe@perl.org.

If you prefer, you can contact me via perl@jochen-stenzel.de as well.

AUTHOR

Copyright (c) Jochen Stenzel (perl@jochen-stenzel.de), 1999-2001. All rights reserved.

This module is free software, you can redistribute it and/or modify it under the terms of the Artistic License distributed with Perl version 5.003 or (at your option) any later version. Please refer to the Artistic License that came with your Perl distribution for more details.

The Artistic License should have been included in your distribution of Perl. It resides in the file named "Artistic" at the top-level of the Perl source tree (where Perl was downloaded/unpacked - ask your system administrator if you dont know where this is). Alternatively, the current version of the Artistic License distributed with Perl can be viewed on-line on the World-Wide Web (WWW) from the following URL: http://www.perl.com/perl/misc/Artistic.html

DISCLAIMER

This software is distributed in the hope that it will be useful, but is provided "AS IS" WITHOUT WARRANTY OF ANY KIND, either expressed or implied, INCLUDING, without limitation, the implied warranties of MERCHANTABILITY and FITNESS FOR A PARTICULAR PURPOSE.

The ENTIRE RISK as to the quality and performance of the software IS WITH YOU (the holder of the software). Should the software prove defective, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

IN NO EVENT WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY CREATE, MODIFY, OR DISTRIBUTE THE SOFTWARE BE LIABLE OR RESPONSIBLE TO YOU OR TO ANY OTHER ENTITY FOR ANY KIND OF DAMAGES (no matter how awful - not even if they arise from known or unknown flaws in the software).

Please refer to the Artistic License that came with your Perl distribution for more details.