The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Sort::Key::Merger - Perl extension for merging sorted things

SYNOPSIS

  use Sort::Key::Merger qw(keymerger);

  sub line_key_value {

      # $_[0] is available as a scratchpad that persist
      # between calls for the same $_;
      unless (defined $_[0]) {
          # so we use it to cache the file handle when we
          # open a file on the first read
          open $_[0], "<", $_
              or croak "unable to open $_";
      }

      # don't get confused by this while loop, it's only
      # used to ignore empty lines
      my $fh = $_[0];
      local $_; # break $_ aliasing;
      while (<$fh>) {
          next if /^\s*$/;
          chomp;
          if (my ($key, $value) = /^(\S+)\s+(.*)$/) {
              return ($key, $value)
          }
          warn "bad line $_"
      }

      # signals the end of the data by returning an
      # empty list
      ()
  }

  # create a merger object:
  my $merger = keymerger { line_key_value } @ARGV;

  # sort and write the values:
  my $value;
  while (defined($value=$merger->())) {
      print "value: $value\n"
  }

DESCRIPTION

Sort::Key::Merger allows to merge presorted collections of things based on some (calculated) key.

EXPORT

None by default.

The functions described below can be exported requesting so explicitly, i.e.:

  use Sort::Key::Merger qw(keymerger);

FUNCTIONS

keymerger { generate_key_value_pair } @sources;

merges the (presorted) generated values sorted by their keys lexicographically.

Every item in @source is aliased by $_ and then the user defined subroutine generate_key_value_pair called. The result from that subroutine call should be a (key, value) pair. Keys are used to determine the order in which the values are sorted and returned.

generate_key_value_pair can return an empty list to indicate that a source has become exhausted.

The result from keymerger is another subroutine that works as a generator. It can be called as:

  my $next = &$merger;

or

  my $next = $merger->();

In scalar context it returns the next value or undef if all the sources have been exhausted. In list context it returns all the values remaining from the sources merged in a sorted list.

NOTE: an additional argument is passed to the generate_key_value_pair callback in $_[0]. It is to be used as a scrachpad, its value is associated to the current source and will perdure between calls from the same generator, i.e.:

  my $merger = keymerger {

      # use $_[0] to cache an open file handler:
      $_[0] or open $_[0], '<', $_
          or croak "unable to open $_";

      my $fh = $_[0];
      local $_;
      while (<$fh>) {
          chomp;
          return $_ => $_;
      }
      ();
  } ('/tmp/foo', '/tmp/bar');

This function honours the use locale pragma.

nkeymerger { generate_key_value_pair } @sources

is like keymerger but compares the keys numerically.

This function honours the use integer pragma.

filekeymerger { generate_key } @files;

returns a merger subroutine that returns lines read from @files sorted by the keys that generate_key generates.

@files can contain file names or handles for already open files.

generate_key is called with the line just read on $_ and has to return the sorting key for it. If its return value is undef the line is ignored.

The line can be modified inside generate_key changing $_, i.e.:

  my $merger = filekeymerger {
      chomp($_); #             <-- here
      return undef if /^\s*$/;
      substr($_, -1, 10)
  } @ARGV;

Finally, $/ can be changed from its default value to read the files in chunks other than lines.

The return value from this function is a subroutine reference that on successive calls returns the sorted elements; or all elements in one go when called in list context, i.e.:

  my $merger = filekeymerger { (split)[0] } @ARGV;
  my @sorted = $merger->();

This function honours the use locale pragma.

nfilekeymerger { generate_key } @files;

is like filekeymerger but the keys are compared numerically.

This function honours the use integer pragma.

SEE ALSO

Sort::Key, locale, integer, perl core sort function.

AUTHOR

Salvador Fandiño, <sfandino@yahoo.com>

COPYRIGHT AND LICENSE

Copyright (C) 2005 by Salvador Fandiño.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.