Text::TeX -- Perl module for parsing of TeX.
TeX
use Text::TeX; sub report { my($eaten,$txt) = (shift,shift); print "Comment: `", $eaten->[1], "'\n" if defined $eaten->[1]; print "@{$txt->{waitfors}} ", ref $eaten, ": `", $eaten->[0], "'"; if (defined $eaten->[3]) { my @arr = @{ $eaten->[3] }; foreach (@arr) { print " ", $_->print; } } print "\n"; } my $file = new Text::TeX::OpenFile 'test.tex', 'defaultact' => \&report; $file->process;
A new TeX parser is created by
$file = new Text::TeX::OpenFile $filename, attr1 => $val1, ...;
$filename may be undef, in this case the text to parse may be specified in the attribute string.
undef
string
Recognized attributes are:
contains the text to parse before parsing $filename.
defaultact
denotes a procedure to submit output tokens to.
output tokens
tokens
gives a hash of descriptors for input token. A sane default is provided.
descriptors
input token
A call to the method process launches the parser.
process
When the parser is running, it processes input stream by splitting it into input tokens using some heuristics similar to the actual rules of TeX tokenizer. However, since it does not use the exact rules, the resulting tokens may be wrong if some advanced TeX command are used, say, the character classes are changed.
input tokens
This should not be of any concern if the stream in question is a "user" file, but is important for "packages".
The processed input tokens are handled to the digester, which handles them according to the provided tokens attribute.
This is a hash reference which describes how the input tokens should be handled. A key to this hash is a literal like ^ or \fraction. A value should be another hash reference, with the following keys recognized:
^
\fraction
Into which class to bless the token. Several predefined classes are provided. The default is Text::TeX::Token.
Text::TeX::Token
What kind of special processing to do with the input after the class methods are called. Recognized Types are:
class
Type
When the token of this Type is encountered, it is converted into Text::Tex::BegArgsToken. Then the arguments are processed as usual, and an output token of type Text::Tex::ArgToken is inserted between them. Finally, after all the arguments are processed, an output token Text::Tex::EndArgsToken is inserted.
Text::Tex::BegArgsToken
output token
Text::Tex::ArgToken
Text::Tex::EndArgsToken
The first element of these simulated output tokens is an array reference with the first element being the initial output token which generated this sequence. The second element of the internal array is the number of arguments required by the input token. The Text::Tex::ArgToken token has a third element, which is the ordinal of the argument which ends immediately before this token.
If requested, a token Text::Tex::LookAhead may be returned instead of Text::Tex::EndArgsToken. The additional elements of $token-[0]> are: the reference to the corresponding lookahead attribute, the relevant key (text of following token) and the corresponding value.
Text::Tex::LookAhead
$token-
lookahead
In such a case the input token which was looked-ahead would generate an output token of type Text::Tex::BegArgsTokenLookedAhead (if it usually generates Text::Tex::BegArgsToken).
Text::Tex::BegArgsTokenLookedAhead
Means that these macro introduces a local change, which should be undone at the end of enclosing block. At the end of the block an output event Text::TeX::EndLocal is delivered, with $token-[0]> being the output token for the local event starting.
Text::TeX::EndLocal
Useful for font switching.
Some additional keys may be recognized by the code for the particular class.
count
number of arguments to the macro.
waitfor
gives the matching token for a starting delimiter token.
eatargs
number of tokens to swallow literally and put into the relevant slot of the output token. The surrounding braces are stripped.
selfmatch
is used with eatargs==1. Denotes that the matching token is also eatargs==1, and the swallowed tokens should coinside (like with \begin{blah} ... \end{blah}).
eatargs==1
\begin{blah} ... \end{blah}
is a hash with keys being texts of tokens which need to be treated specially after the end of arguments for the current token. If the corresponding text follows the token indeed, a token Text::Tex::LookAhead is returned instead of Text::Tex::EndArgsToken.
The hash %Text::TeX::xfont contains the translation table from TeX tokens into the corresponding font elements. The values are array references of the form [fontname, char], Currently the only font supported is symbol.
[fontname, char]
symbol
Ilya Zakharevich, ilya@math.ohio-state.edu
perl(1).
To install Text::TeX, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Text::TeX
CPAN shell
perl -MCPAN -e shell install Text::TeX
For more information on module installation, please visit the detailed CPAN module installation guide.