The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Log::Log4perl::AutoCategorize - extended Log::Log4perl logging

ABSTRACT

Log::Log4perl::AutoCategorize extends Log::Log4perls (l4p) easy mode, adding 2 main features;

  1. extended, automatic, transparent categorization capabilities
    a. logging category <- package.subroutine.loglevel.linenumber
  2. runtime information useful for:
    a. test-coverage assessment
    b. managing your logging config.
  3. Minor convenience enhancements
    a. config-load in use statement
    b. Data::Dumper of ref args

There are several more mature alternatives which you should check out for comparison;

  #1. search for Stealth Loggers in base POD
  use Log::Log4perl qw(:easy);

  #2. new functionality, developed at approx same time as AutoCategorize
  use Log::Log4perl::Filter;

SYNOPSIS

  # use pre-defined logging - to stdout
  use Log::Log4perl::AutoCategorize;

  # OR separate your log-config (standard practice)
  use Log::Log4perl::AutoCategorize ( initfile => $filename );

  # OR define your config inline (for development)
  use Log::Log4perl::AutoCategorize
    (
     alias => 'Logger', # shorthand class-name alias
     initstr => q{
         # see Log4perl docs to understand these directives
         log4perl.rootLogger=DEBUG, A1
         # log4perl.appender.A1=Log::Dispatch::Screen
         log4perl.appender.A1 = Log::Dispatch::File
         log4perl.appender.A1.filename = ./mylog
         log4perl.appender.A1.mode = write
         log4perl.appender.A1.layout = PatternLayout
         log4perl.appender.A1.layout.ConversionPattern=%d %c %m%n
         # create TEST-COVERAGE log
         log4perl.appender.COVERAGE = Log::Dispatch::File
         log4perl.appender.COVERAGE.mode = write
         log4perl.appender.COVERAGE.layout = org.apache.log4j.PatternLayout
         log4perl.appender.COVERAGE.layout.ConversionPattern = (%d{HH:mm:ss.SSS}) %c: %m%n
         # save timestamped versions, 1 per process
         log4perl.appender.COVERAGE.filename = sub {"./test-coverage.txt.". scalar localtime}

         # now, add the value: send the stuff written at END to it
         log4perl.logger.Log.Log4perl.Autocategorize.END = INFO, COVERAGE
         },
     );

  foreach (1..500) {
    Logger->warn($_);
    foo();
    A->bar();
    A::bar();
  }

  sub foo {
    foreach (1..20) {
        Logger->warn($_);
    }
  }

  package A;

  sub bar {
    my @d;
    foreach (reverse 1..10) {
        push @d, $_;
        Logger->warn("t-minus:", $_,\@d);
    }
  }

DESCRIPTION

Before diving in, a few notes:

This is not intended to document Log4perl (aka: l4p,) which already has excellent documentation.

In this document, I use Logger as the official shorthand for Log::Log4perl::AutoCategorize, as it gets tedious and verbose to repeat the full name. There is module support for you to do this too; note that its used in the example above, ie: 'alias => Logger'. You can alias it as you prefer: myLogger, ourLogger, or you can subclass it.

I try to use call for the act of calling a method at runtime, and invocation to refer to the source-code (ie: package,sub,line-number) that is called. I hope that the distinction will help to bend the brick into a more hat-like shape.

This module requires perl 5.8, as the optimizer is unsupported in 5.6. You can use this module IFF you uncomment the __END__ before the optimization stuff is done (at bottom of file), but runtime costs are considerable (40% overhead), so Im reluctant to add this as an option. That said, l4p install scripts adjust the code to the perl version (ex, they revert from 'our $var' syntax for 5.5.3, since its a 5.6 feature). I would accept a patch to do similar..

AutoCategorization

The primary feature of this module is to extend Log4perls flexibility by automatically and transparently creating logging categories for each invocation in your application code.

use Log::Log4perl ':easy' offers a comparable, but less capable feature; it automatically infers that $logCat = "$caller_package", and uses that category to properly handle the message at runtime.

  package Foo;
  use Log::Log4perl(':easy');

  package Foo::Bar;
  use Log::Log4perl(':easy');

This module extends the $logCat with info thats obtained at runtime, leveraging L4Ps filtering mechanisms by giving more detail upon which to filter.

    $logcat = "$package.$subname.$loglevel.$linenum";

L4Ps bubble-up behavior is still in effect; if your log-config never specifies anything beyond $package, the $logCat of each call bubbles up thru the log-config, until it matches with your package-level config item.

In other words: since $logCat contains more information than that provided by :easy, your logging configuration can exersize more control over what gets logged.

Note that you could have always used whatever categories you wanted, see "Configuration files" in Log::Log4perl, you just didnt get the help that made it easy to do so.

Test Coverage Results of your Application

When your code uses this module, the logging calls are categorized and counted; this is reported (to standard out, if you don't override the default-config) when the program terminates. The info is returned in 3 chunks.

Seen: How many times each invocation was called

The %Seen hash stores and counts all invocations of all functions, keyed by the category. This is variously called a usage report, a test-coverage report, a seen-report, etc.

 (15:24:24.772) Logger.END.info.106: $Seen Log Events: = [
  {
    '#log4perl.category.A.bar.debug.55' => '-400',
    '#log4perl.category.A.bar.warn.54' => 400,
    '#log4perl.category.Log.Log4perl.AutoCategorize.END.info.201' => 1,
    '#log4perl.category.main.foo.warn.44' => 200,
    '#log4perl.category.main.info.32' => 10,
    '#log4perl.category.main.warn.31' => 10
  }
 ];

Note that A.bar.debug.55 above has a negative count. This indicates that the logging activity was suppressed by the configuration, but the code was still reached 400 times.

Also, the keys are prefixed with '#log4perl.category.', which simplifies cut-paste of this output into your log-config.

UnSeen: What invocations were never called

This report identifies the Logging invocations which were never called during this run of your program. This can be used systematically to assess how thoroughly you are testing your application.

 (15:24:24.772) Logger.END.info.106: $un-Seen Log Events: = [
  {
    'debug_00011' => 'main,probe.pl,52',
    'info_00010' => 'main,probe.pl,51',
  }
 ];

Category to Munged-Name

This hash identifies the map between the categories and the unique function-name. It is provided primarily for completeness, it may prove useful for debugging and/or other things.

 (19:44:06.210) Log.Log4perl.AutoCategorize.END.info.308: cat2data:, {
  'A.truck.debug.63' => 'debug_00005',
  'A.truck.debug.65' => 'debug_00006',
  'A.truck.debug.66' => 'debug_1_00007',
  'A.truck.warn.62' => 'warn_00004',
  ...
  }

Assessing Test Coverage

One way to use the test coverage facility is by running your test suite and collecting the individual coverage reports. Since the config-items are sorted in the report, its easy to compare 2 of them; you can tell what parts of the system are tested by each part of the test suite.

Log4perl can write log-files with configurable names (ex: a timestamped name, see the SYNOPSIS example above). You can use this facility to write separate files for each sub-test, and then merge them (ex: `cat *.coverage-log|sort -u`), and quickly assess the total coverage provided by the entire test suite.

Keeping your Logging Config current

Log4perl documentation speaks of the hazards/limitations of categories; this is more critical here, because categories are highly leveraged. See "Pitfalls with Categories" in Log::Log4perl for more info.

This package is indisputably more vulnerable to such problems; it exposes line numbers, which change regularly during development. If you use numbers, be aware that adding comments to your code will change the category reported, rendering those log-config items ineffective for controlling output.

Moreover, you should recognize that as the numbers become more stable, theyre also less valuable; by then, youve reassesed and refined your choices wrt log-levels. (you have, yes?)

However, you can also be more disiplined (TMTOWTDI), using config-lines with a mix of specificity: package, package.method, and package.method.logginglevel. Here the coverage report helps, since it lists all executed logger-invocations, it serves as a pretty good roadmap to your application, letting you see the entire forest through the trees.

Also, the coverage report is formatted to be easily edited into a informative and useful logging configuration, ie:

    1. comments/documents the 'existence' of a log-config entry
    2. doesnt create the hierarchy in the logger-config, efficient
    3. identifies all the code instrumentation available
    4. serves as cut-paste fodder for your log-configs

Bottom line: I recommend that you start your log-config file by copying and editing the coverage report. Just remove the quotes put in by Data::Dumper, change '=>' to '=', remove '#' to activate lines, and change the count to DEBUG, WARN, etc.

What %Seen cannot tell you

Because the %Seen coverage only includes the categories it sees at (a single) runtime, it cant 'remember' the aggregate coverage from previous runs.

The sort technique given above to assess aggregate test-coverage is not as useful here; it sorts commented lines away from their uncommented counterparts, but its a start.

I may add an option to support a separate 'remembering' aggregate via a tied variable thats initialized in a BEGIN block. This is a distant feature; I will spend some time with Devel::Cover to gain insight.

Filtering with Log::Log4perl::AutoCategorize

Restating in different words (at risk of redundancy), filtering with this package uses the base behavior of bubbling up a logging event through the config, and using generic config-items which are not overridden by more specific items.

Using part or all of the hierarchy of categories in your config-file, you can enable or disable logging on a class, a method in that class, particular logging levels with that method, or line by line.

    log4perl.category.Frob = DEBUG                      # 1
    log4perl.category.Frob.nicate.debug = INFO          # 2
    log4perl.category.Frob.nicate.debug.118 = DEBUG     # 3
    #log4perl.category.Frob.Vernier.twiddle.debug = DEBUG       # 4

(1) enables debug logging accross the Frob class, but then (2) overrides that class-wide setting within the method nicate(), suppressing debug logging (by setting the threshold to INFO). (3) further overrides that setting by enabling the debug statement on line 118.

Tweaking your log config

If you look carefully at the example above, (2) may be confusing; it has 'debug = INFO', which looks like nonsense. The 'debug' is the log-level that the programmer wrote, it reflects his judgement as to the nature of the message.

The '= INFO' part tells you that the config user decided to suppress debug messages from nicate(), which were enabled class-wide by (1). (3) restores normal behavior of debug logging, but only for 1 invocation.

Note also that (4), even if it was uncommented, would have no effect on the logging, because its just saying that all debug() calls done by Frob::Vernier::twiddle() are issued at their default level. Keeping such lines commented is good for documentation, and more efficient; it saves Log::Log4perl both cpu and memory by not loading config-items that arent meaningful.

Other Features

Automatic categorization, logging coverage, and fine-grained control of logging are the primary value-added features of this wrapper package. Other secondary features are;

Configuration Loading

If use Log::Log4perl::AutoCategorize is given an import list which includes initstr => q{} or initfile => $filename, the string or file is used to initialize Log::Log4perl. You can still use the old way, ie $base->init().

Classname Aliasing

Because it would be inconvenient to type the full classname for every statement, ex: Log::Log4perl::AutoCategorize->debug(), the package has the ability to alias itself into a classname of your choice, ex:

    use Log::Log4perl::AutoCategorize ( alias => myLogger );
    myLogger->debug("ok doing it");

Because aliasing is used in optimization phase, it must be imported via use, it cannot be deferred as in require, import.

Easy Initialization

You can initialize the base logger by any of the following;

  # as use parameters, least typing
  a. use Log::Log4perl::Autocategorize ( initstr  => $string );
  b. use Log::Log4perl::Autocategorize ( initfile => $filename );

  # assuming youve already used the package
  c. Log::Log4perl::AutoCategorize->init( $filename );
  d. Log::Log4perl::AutoCategorize->init( \$string );

  # you can cheat, and use the base class name
  e. Log::Log4perl->init( $filename );
  f. Log::Log4perl->init( \$string );

  # assuming youve aliased
  g. Logger->init( $filename );
  h. Logger->init( \$string );

  i. # no explicit initialization at all (theres a default config)

The default config writes to stdout, and includes a test-coverage setup. This config is used at import() time, unless a or b is provided by you. If you explicitly call init(), the base is re-initialized, and relys on base behavior to work correctly.

I hope at some point to provide init_modify(), which will overlay new configuration on existing, rather than a full reinitialization. This will be implemented using Log4perls (currently nonexistent) configuration include mechanism.

Automatic Dump of structured data

If you pass in a data-ref, its rendered using Data::Dumper, making it easy to examine the details of the runtime context. Since the Dump is embedded inside the logging-method, its only called when needed, avoiding work to produce output that will just be thrown away.

SubClassing

As sayeth "Planning for the Future: Better Constructors" in perltoot, you make you package suitable for deriving new ones. test 04_subclass.t verifies this, though it wasnt the 1st thing I got working.

Module Architecture

The previous section hopefully provides sufficient motivation for why you should use this module, this section describes how the module works.

2 phase strategy to achieve both greater filtering capability, and similar or better efficiency.

Original AUTOLOAD functionality

In v0.001, AUTOLOAD did the entire job itself, primarily cuz it seemed better to write the code once, rather than 5 times (for debug, info, warn, error, fatal). Every time a call like Logger->warn() was reached, AUTOLOAD would get control, and do the following:

    0. determine log-level which was called ($AUTOLOAD)
    1. used caller() to construct a category string, $logcat, dynamically.
    2. fetched the $logger singleton {$logger = get_logger($logcat)}
    3. tested if $logcat was a loggable event {$logger->is_$loglevel()}
    4. logged the message, or not, as appropriate.

Doing this repeatedly using caller() for every call is computationally expensive for a logging subsystem. If the module is to be usable application wide, it must be efficient.

Gaining efficiency

The canonical way to get speed from AUTOLOAD is to not call it; or more helpfully, to create the desired subroutine so that it gets called for all future calls to that method. But in this case, the decision reached in step 4 is different for every invocation, so it cannot be reduced to a single subroutine unless that routine calls caller(), which defeats the purpose.

The only way to get custom behavior for each invocation is to make them each invoke a unique method name; ie to munge the method names where they are invoked. Once this is done, AUTOLOAD can then vivify subroutines which 'remember' results of caller(), and the log/nolog decision reachedd for that invocation. Thereafter AUTOLOAD wont be called again (for that unique method), avoiding steps 0-4 completely.

Using optimizer.pm

I use Simon Cozen''s optimizer.pm to convert the source code:

    FROM: {
        Logger->debug ("msg A");
        Logger->debug ("msg B");
    }
    TO: {
        Logger->debug_001 ("msg A");
        Logger->debug_002 ("msg B");
        # etc...
    }

The function-munging produces a unique function name for each point in your code where you invoke a logging method. This allows AUTOLOAD (which handles all these calls) to distinguish each invocation, make a custom handler routine for it. Once the custom handler is added into the symbol table, it is called later without any overhead.

How optimizer munges

Your code, 'use optimizer => sub yourfunc;' operates at compile time, letting you search the opcodes for specific op-chains. Once youve got the right opchain, you can do the opcode surgery.

For example, we must recognize this opcode chain;

`perl -MO=Concise,-exec -e 'Logger->info_00010(2,"args")`; t76 <0> pushmark s t77 <$> const(PV "Logger") sM/BARE t78 <$> const(IV 2) sM t79 <$> const(PV "args") sM t7a <$> method_named(PVIV "info_00010") t7b <1> entersub[t23] vKS/TARG

Our criteria is (pushmark, const with PV="Logger", ..., method_named with PVIV=~$namePatt). We also recognize the opcodes that build arguments on the stack, so we can find the balanced method_named opcode. Finally, we insure that the method name being invoked is one of the allowed ones.

Caveat: you must use a specific coding style

To match the above criteria, your code must use *ONLY the 1st* of these 3 different coding styles;

  IE: Logger->debug("foo");
  NOT: $logger->debug("foo");
  NOR: Logger::debug("foo");

This form provides the package name as a bareword (t77), which allows a rather strict (safe?) test; we dont want to stomp in others\' namespaces (except for the explicit takeover of the $Alias package)

The 2nd form is not safely detectable at compile time, since $logger type cannot be absolutely known, and we try not to do risky surgery. For example

    # never inadvertently do this -
    $logger->warn if ref $logger eq 'FlareGun';

The 3rd form is doable, but clutters the code which does $Alias recognition, for little syntactic gain. Currently, you wont even get a warning you if you use this construct.

lightweight customized subroutines

Given that all the customized anonymous subroutines are instantiated as either sub{} or sub{ logit("$cat") }; they all invoke a common logit() routine that does the real work.

The asubs are trivial wrappers, with the category value stringified and held in code, and passed into logit() when called. This doesnt shrink the number of symbol table entries, but it minimizes the size of the CVs hanging there.

To save more space, a common do_nothing() function could be used instead of the current invocation counting do-nothing, at the cost of losing test-coverage info.

AUTOLOAD revisited

Once method munging was in place, $AUTOLOAD would contain, for example; not "debug", but "debug_00011". Now that each invocation is unique, the distinctions can be computed once and remembered; ie they can be reduced to a dynamically created sub, and added to the symbol table.

IOW, these changes were made to AUTOLOAD:

    4. computes the category using caller()
    5. determines whether the base logger would send the message.
    6. instantiates an anonymous subroutine reflecting 5.
    7. adds the sub to the symbol table, with the munged method name.

At runtime, invocations like Logger->warn_$UIID() are called, and dispatched to AUTOLOAD. Once the munged function is in the symbol table, AUTOLOAD is never called again for this invocation.

Benchmarking

Ive done one Benchmark.pm based test (tshort.pl) which doesnt seem quite right. The simpler approach of using 2 similar test-scripts shows more convincing results.

    $jimc@harpo logger]$ time t1.pl -n > /dev/null

    real        1m30.860s
    user        1m28.246s
    sys 0m2.002s

    $jimc@harpo logger]$ time t2.pl

    ... debug output snipped ...

    real        0m52.229s
    user        0m49.363s
    sys 0m1.625s

This last benchmark implements above test in pure Log::Log4perl code. As you can see, this package is competetive with Log::Log4perl itself, at least when analogous features are used (that use is a bit forced, you wouldnt NEED to do so all the time).

Nonetheless, its conceivable to get better than native performance if internal short-circuits can be taken.

Please run perftests/timeit.sh and send me email, particularly if you get wildly different results.

CAVEAT (restated)

This package expects you to use Logger->warn() style of coding, $logger->warn() will not work, unless 'my Dog $spot' syntax can make real guarantees. This is because the type of $logger cannot be known at compile time, and its too risky to munge everything that looks like $flaregun->warn().

Even if it were possible, it would probably be bad to mix lexical code customization with objects and their attributes.

BUGS

runaway process in Debugger

When I run a program using this module in debugger, and I hit control-P (Im an emacs user, so I do it habitually), the process runs amok, and I have to terminate it from another window. If youre a vi user, this is less of a problem, but it warrants keeping this module in ALPHA state. Its also the reason for the many debugging flags.

abusable category extension

The category extension can be abused; you can independently suppress warn() while elevating debug(), leading to surprises in what output goes where. But this can be used sanely, so I dont consider worth disabling. I could be convinced to optionally report it; as ever, a patch is a compelling argument.

TODO

quieter tests

For ease of immediate use, this module defines a default logging config that writes to stdout. This however makes the most basic tests also write to stdout, which is disconcerting for some would-be users.

better test coverage support

The %Seen report can never contain the categories of unexecuted functions, since the catalog entry is created by AUTOLOAD, which is never run if the function isnt executed. Ive thought of a few "solutions", but theyre all insane ..

    1. remember functions during munging, call them once in CHECK{}.
       Wont work; would get useless caller() info.

    2. Tie %Total_Invocations to a DBM file, update in the END.  This
       is seductive; but it will have trouble forgetting invocations
       that have been removed from the code, or in the case of
       line-numbered log-specs, been moved by a mere comment. Note
       though that reset control could be added to API.

Its probably occurred to you that Devel::Cover does all these things. It is however unsuitable for use in production code, and l4p's runtime reconfiguarability is quite valuable there.

better interoperation with l4p easy_init()

Test t/09_coexist_easy.t demonstrates use of both l4p :easy, and l4pa, but it writes 2 separate logfiles. Ive not tried recently to write to a single logfile.

devise good way to limit the deep logging of ref-data

AutoCategorize uses Data::Dumper to render the structure of complex data arguments. It would be nice to selectively suppress or modify the Dumper output, and/or print just the strings.

But to do this via the log-config, the config-item syntax must be extended; either by adding new attributes, or by allowing new values. Either approach compromises interoperability with log4java config files, so these possible config-schemes should therefore be considered highly speculative (mostly here to elicit better ideas);

    # using extra param
    log4perl.category.TestUtils.testcase_applies = DEBUG, 0

The 2nd parameter would control whether Data::Dumper was invoked, in this case they would not be. This scheme has advantage of having a wide API; hashrefs etc can (in principle - and subject to reality check by Gurus) be given as parameters.

Other schemes are also possible;

    # an additional '.detail' attribute
    log4perl.category.A.foo.debug.detail = 0

    # new .*.detail attribute, with (needed) wild-card
    log4perl.category.A.foo.debug.*.detail = 0
    # limit the depth of dumper (one way, others possible)
    log4perl.category.A.foo.debug.*.detail = -20
    # alternative values
    log4perl.category.A.foo.debug.*.detail = YAML
    log4perl.category.A.foo.debug.*.detail = sub { YAML->dump }

    # alternate (currently unused) directives
    log4perl.autocat.A.foo.debug.*.detail = sub { YAML->dump }

allow use of alternates to Data::Dumper

YAML comes to mind, as does Data::Dumper::Sortkeys = sub {...}; I took a quick swipe at using YAML::Dump (-dy debug option to perftests/tdebug.pl), but its not yielding expected results.

Its also possible to get various object and context sensitive serializations, esp if combined with extra params above.

devise extended method-name space

This package is hardcoded to only munge invocations to one of (debug, info, notice, warn, error), non matching are left unaltered, because some calls should delegate thru to Log::Log4perl, and we should be conservative about what gets munged.

But the point is this: A whole panoply of method suffixes are possible; (ex: /(info|debug)_.*/), a careful use of this method namespace could be used to specify many subtle (or not so subtle) variations in behavior (ex: debug_dump, warn_nodump, error_iftrue, info_iffalse).

short-circuit internal checks

Its likely that several checks and/or dereferences internal to Log::Log4perl can be eliminated once its determined (at 1st invocation) that a log event should actually do something. This is however risky; too much deep coupling.

init and watch

Log4perl has ability to watch the logfile for changes, and change its operation accordingly. This package must hook into that notification, and use it to undefine the appropriate symbol-table entry, thus allowing the AUTOLOAD mechanism to re-vivify it according to the new configuration. Obviously it would be nice to do this only to the parts of the config that are different.

This feature will use the %cat2data mapping built at 1st invocation to allow config changes to be mapped back to the munged method-name that resolved to that category in the 1st place.

context sensitive logging

NDC and MDC are tantalizing features, they appear to hold some promise for conditional logging; ie once an application error is detected, a full dump of the NDC or MDC is logged. Alternatively, once an all-ok is determined, a stack of NDC logs can be discarded, and a summary message issued.

Remembering the AutoCat-alog

See also "BUGS/Seen.2". While it borders on the baroque, its possible to tie %TotalInvocations to a DBM or Storable file, whose name is controlled by your log-config. Then you could use Log4perl''s logitem = sub{} to synchronize the filename to your VERSION; either $package::VERSION, a complete CVS Revision, or to your application Version. Interestingly, this could give some insight into longer-term code evolution. See how Devel::Cover does it.

Improve the documentation

In an effort to provide adequate motivation and explanation, Ive gotten long-winded, repeating ideas several times. Im now too close to this to judge their clarity adequately, I welcome feedback and suggestions wrt what is tedious or unclear.

To this end, Im including 2 versions of a lightning (5 min) talk I gave on this module at YAPC::NA in Boca. The longer version I tested on several willing victims (thx guys), and the short version I actually gave based upon their valuable comments. If these are helpful, please let me know how they could be folded into this document to improve clarity and/or remove redundancy.

Provide patches to optimizer, B-Generate (DONE)

To use this package on perl 5.8.1+ you need to patch both optimizer, B-Generate for updated internals. I have patches on CPAN.

More TODO

    find and kill bugs..
    write more test cases..
    optimize more
    integrate more fully with Log::Log4perl
    revisit perftests
    help/feedback is welcome.

CREDITS

Mike Schilli and Kevin Goess for Log::Log4perl, and for reviewing my docs and code, and pointing out features I had not fully understood.

Simon Cozens, Arthur Bergman for optimizer, B::Generate. The extra features this module provides would not be practical w/o the deep magic in those packages; Im just repeating incantations.

AUTHOR

    Jim Cromie <jcromie@cpan.org>

COPYRIGHT AND LICENSE

Copyright 2003 Jim Cromie

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

(I always wanted to write that ;-)