The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Set::Toolkit - searchable, orderable, flexible sets of (almost) anything.

VERSION

Version 0.11

SYNOPSIS

The Set Toolkit intends to provide a broad, robust interface to sets of data. Largely inspired by Set::Object, a default set from the Set Toolkit should behave similarly enough to those created by Set::Object that interchanging between the two is fairly easy and intuitive.

In addition to the set functionality already available around the CPAN, the Set Toolkit provides the ability to perform fairly complex, chained searches against the set, ordered and unordered considerations, as well as the ability to enforce or relax a uniqueness constraint (enforced by default).

  use Set::Toolkit;

  $set = Set::Toolkit->new();
  $set->insert(
    'a',
    4, 
    {a=>'abc', b=>123},
    {a=>'abc', b=>456, c=>'foo'},
    {a=>'abc', b=>456, c=>'bar'},
    '',
    {a=>'ghi', b=>789, c=>'bar'},
    {
      x => {
        y => "hello",
        z => "world",
      },
    },
  );

  die "we didn't add enough items!"
    if ($set->size < 4);

  ### Find single elements.
  $el1 = $set->find(a => 'ghi');
  $el2 = $set->find(x => { y=>'hello' });

  ### Print "Hello, world!"
  print "Hello, ", $el2->{x}->{z}, "!\n";

  ### Search for result sets.
  ### $resultset will contain:
  ###   {a=>'abc', b=>456, c=>'foo'},
  ###   {a=>'abc', b=>456, c=>'bar'},
  $resultset => $set->search(a => 'abc')
                    ->search(b => 456);

  ### $bar will be: {a=>'ghi', b=>789, c=>'bar'},
  $bar = $set->search(a => 'abc')
             ->search(b => 456)
             ->find(c => 'bar');

  ### Get the elements in the order they were inserted.  These are equivalent:
  @ordered = $set->ordered_elements;

  $set->is_ordered(1);
  @ordered = $set->elements;
  
  ### Get the elements in hash-random order.  These two are equivalent:
  @unordered = $set->unordered_elements

  $set->is_ordered(0);
  @unordered = $set->elements;

DESCRIPTION

This module implements a set objects that can contain members of (almost) any type, and provides a number of attached helpers to allow set and element manipulation at a variety of levels. By "almost", I mean that it won't let you store undef as a value, but not for a good reason: that's just how Set::Object did it, and I haven't had a chance to think about the pros and cons yet. Probably in the future it'll be a settable flag.

The set toolkit is largely inspired by the work done in Set::Object, but with some notable differences: this package ...

  • ... provides for ordered sets

  • ... is pure perl.

  • ... is slower for the above reasons (and more!)

  • ... provides mechanisms for searching set elements.

  • ... does not flatten scalars to strings.

  • ... probably some other stuff.

In general, take a look at Set::Object first to see if it will suit your needs. If not, give Set::Toolkit a spin.

By default, this package's sets are intended to be functionally identical to those created by Set::Object (or close to it). That is, without specifying differently, sets created from the Set::Toolkit will be an unordered collection of things without duplication.

EXPORT

None at this time.

FUNCTIONS

Construction

new

Creates a new set toolkit object. Right now it doesn't take parameters, because I have not codified how it should work.

Set manipulation

insert

Insert new elements into the set.

  ### Create a set object.
  $set = Set::Toolkit->new();
  
  ### Insert two scalars, an array ref, and a hash ref.
  $set->insert('a', 'b', [2,4], {some=>'object'});

Duplicate entries will be silently ignored when the set's is_unique constraint it set. (This behavior is likely to change in the future. What will probably happen later is the element will be added and masked. That will probably be a setting =)

remove

Removes elements from the set.

  ### Create a set object.
  $set = Set::Toolkit->new();
  
  ### Insert two scalars, an array ref, and a hash ref; the set size will
  ### be 4.
  $set->insert('a', 'b', [2,4], {some=>'object'});

  ### Remove the scalar 'b' from the set.  The set size will be 3.
  $set->remove('b');

Note that removing things removes all instances of it (this only really matters in non-unique sets).

Removing references might catch you off guard: though you can insert object literals, you can't remove them. That's because each time you create a new literal, you get a new reference. Consider:

  ### Create a set object.
  $set = Set::Toolkit->new();
  
  ### Insert two literal hashrefs.
  $set->insert({a => 1}, {a => 2});

  ### Remove a literal hashref.  This will have no effect, because the two
  ### objects (inserted and removed) are *different references*.
  $set->remove({a => 1});

However, the following should work instead

  ### Create a set object.
  $set = Set::Toolkit->new();
 
  ### Create our two hashes.
  ($hash_a, $hash_b) = ({a=>1}, {a=>2});

  ### Insert the two references.
  $set->insert($hash_a, $hash_b);

  ### Remove a hash reference.  This will work; it's the same reference as
  ### what was inserted.
  $set->remove($hash_a);

Obviously the same applies for all references.

Set inspection

elements

Returns a list of the elements in the set. The content of the list is sensitive to the set context, defined by is_ordered, is_unique, and possibly other settings later.

ordered_elements

Returns a list of the elements in insertion order, regardless of whether the set thinks its ordered or unordered. This can be thought of as a temporary coercion of the set to ordered for the duration of the fetch, only.

unordered_elements

Returns a list of the elements in a random order, regardless of whether the set thinks its ordered or unordered. This can be thought of as a temporary coercion of the set to unordered for the duration of the fetch, only.

The random order of the set relies on perl's treatment of hash keys and values. We're using a hash under the hood.

is_empty

This method will simply tell you if your set is empty. Returns 0 or 1.

first and last

The twin methods first and last do not take any arguments, they simply report the first or last element of the set. Be aware that these methods imply order! Consider:

  my $set = Set::Toolkit->new();
  $set->insert(qw(a b c d e f));
  $set->is_ordered(0);

  ### prints something like "c a d e b f"
  print join(' ', @$set);

  ### prints "a .. f"
  print $set->first, ' .. ', $set->last;

The first element in an unordered set would be an ephemeral, ever-changing value and, therefore, useless (I think =) So first and last are always performed with the temporary constraint that $set->is_ordered(1).

search and find

Searching allows you to find subsets of your current set that match certain criteria. Some effort has been made to make the syntax as simple as possible, though some complexity is present in order to provide some power.

Searches take one argument, a constraint, that can be specified in two primary ways:

  • As a scalar value

  • As a hash reference

Scalar searches

Specifying a constraint as a scalar value makes a very simple check against any scalar values contained in your set (and only such values). Thus, if you search for "b", you will get a subset of the parent set that contains one string "b" for each such occurrance in the super set.

Consider the following:

  ### Create a new set.
  $set = Set::Toolkit->new();

  ### Insert some values.
  $set->insert(qw(a b c d e));

  ### Do a search, and then a find.

  ### $resultset is now a set object with one entry: 'b'
  $resultset = $set->search('b');
  
  ### $resultset is now an empty set object (because we didn't insert any
  ### strings "x").
  $resultset = $set->('x');

For scalars, it probably won't generally be useful to use search. You'll probably want to use find() instead, which simply returns the value sought, rather than a set of matches:

  ### Using the set above, $match now contains 'b'.
  my $match = $set->find('b');

However, there is a case in which you might want to use scalar searches: in sets that are not enforcing uniqueness.

  ### Turn off the uniqueness constraint.
  $set->is_unique(0);

  ### Add some more letters.
  $set->insert(qw(a c e g i j));

  ### Now do some searches:

  ### $resultset will contain <'c','c'>
  $resultset->search('a');

This may be useful for counting occurrances, such as:

  print "There are ", $set->search('a')->size, " occurances of 'a'.\n";

Property searches

On the other hand, searching by property values will probably be useful more often. Consider the following set:

  ### Create our set.
  $works = Set::Toolkit->new();

  ### Insert some complex values:
  $works->insert(
    { name  => {first=>'Franz', last=>'Kafka'},
      title  => 'Metamorphosis',
      date  => '1915'},

    { name  => {first=>'Ovid', last=>'unknown'},
      title  => 'Metamorphosis',
      date  => 'AD 8'},

    { name  => {first=>'Homer', last=>undef},
      title  => 'The Iliad',
      date  => 'unknown'},

    { name  => {first=>'Homer', last=>undef},
      title  => 'The Odyssey',
      date  => 'unknown'},

    { name  => {first=>'Ted', last=>'Chiang'},
      title  => 'Understand',
      date  => '1991'},

    { name  => {first=>'John', last=>'Calvin'},
      title  => 'Institutes of the Christian Religion',
      date  => '1541'},
  );

We can perform an arbitrarily complex subsearch of these fields, as follows:

  ### $homeric_works is now a set object containing the same hash references
  ### as the superset, "works", but only those that matched the first name
  ### "Homer" and the last name *undef*.
  my $homeric_works = $authors->search({
    name => {
      first => 'Homer',
      last => undef,
  });

  ### We can get a specific work, "The Oddysey," for example, by a second
  ### "search" (or "find"):

  ### $oddysey_works is now a set of one.
  my $oddysey_works = $homeric_works->search(title=>'The Odyssey');

  ### We can get the instance (instead of a set) with a "find":
  my $oddysey_work = $homeric_works->find(title=>'The Odyssey');

  ### Which we could have gotten more easily by issuing a "find" on the
  ### original set:
  my $oddysey_work = $works->find(title=>'The Odyssey');

Searches can also be chained, if that's desirable for any reason, and find can be included in the chain, as long as it is the last link.

Note that this is not a speed-optimized scan at this point (but it shouldn't be brutally slow in most cases).

  ### Get a resultset of one.
  my $resultset = $works->search(name=>{first=>'Homer'})
                        ->search(title=>'The Iliad');
 

And you can search against multiple values:

  ### Search against title and date to get Ovid's "Metamorphosis" (yeah, I
  ### realize his was plural, but give me a break here =)

  ### Get the set.
  my $resultset = $works->search(
    title => 'Metamorphosis',
    date  => 'AD 8'
  );

  ### Get the item.
  my $result = $works->find(
    title => 'Metamorphosis',
    date  => 'AD 8'
  );

size

Returns the size of the set. This is context sensitive:

  $set = Set::Toolkit->new();
  $set->is_unique(0);
  $set->insert(qw(d e a d b e e f));

  ### Prints:  
  ###   The set size is 8!
  ###   The set size is 5!
  print 'The set size is ', $set->size, '!';
  $set->is_unique(1);
  print 'The set size is ', $set->size, '!';

Set introspection

is_ordered

Returns a boolean value depending on whether the set is currently considering itself as ordered or unordered. Also a setter to change the set's context.

is_unique

Returns a boolean value depending on whether the set is currently considering itself as unique or duplicable (with respect to its elements). Also a setter to change the set's context.

Contextual considerations

Boolean

Sets can be taken in a boolean context (v0.10). This can be done implicitly by using it in a boolean context. Empty sets are considered false, while sets with elements are considered true. Thus, in boolean contexts, the set answers the question, "Does this set have members?"

  my $set = Set::Toolkit->new();
  
  if ($set) {
    print "The set has members!";
  } else {
    print "The set is empty!";
  }

Under the hood, this just returns

  return ($self->size) ? 1 : 0;

Array

as_array

Sets can be manipulated in an array context as well. An array context enforces set order, since an array without order is just ... well, a set =) That means that for all array considerations, the set is treated as though is_ordered(1). Normal context will return when considering the array as a set toolkit.

The examples below use sets with simple alphanumeric scalars. You can, of course, feel free to use objects or refs of any kind.

Let's look at some code.

Create our set

  my $set = Set::Toolkit->new();
  $set->insert(qw(a b c d e f));

scan our set as an array

  ### Prints: a, b, c, d, e, f
  print join(', ', @$set);

shift and unshift the set

  ### $first is now 'a'.  This is the same as $set->first, except that
  ### shifting is destructive.
  my $first = shift @$set;

  ### $first will now be 'x'
  unshift @$set, 'x';
  $first = $set->first;

push and pop the set

  ### $last is now 'f'.  This is the same as $set->last, except that
  ### popping is destructive.
  my $last = pop @$set;

  ### $last will now be 'z'
  push @$set, 'z';
  $last = $set->last;

get and set elements directly

  my $before = $set->[3];   ### $set->[3] is 'd'.
  $set->[3]  = 8;           ### Set it to '8'.
  my $after  = $set->[3];   ### Now it's '8'.

getting the size of the set (Note that setting the size is not yet supported. You'll get a warning if you try to do it.)

  ### These are equivalent.
  my $size   = $set->size;
  my $scalar = scalar(@$set);

splicing a set

  ### Remove the letter 'c' (position 2)
  splice(@$set, 2, 1);

  ### Replace the letter 'e' (now position 3) with 'm', 'n', 'o'
  splice(@set, 3, 1, qw(m n o));

String (as_string)

as_string

In string context, the array is printed in a manner reminiscent of how refs are printed. For example, a hash $hash = {a=>1} may print as HASH(0x9301880). Similarly, a toolkit will print Set::Toolkit(...), where the ellipsus stands for a space-delimited list of the set's contents.

For example,

  my $set = Set::Toolkit->new();
  $set->insert(qw(a b c));

  ### Prints, for example:  "Set::Toolkit(a c b)"
  print "$set";

The above example is using an unordered set, so the print order is unordered. References will be treated by Perl's native ref stringification:

  my $set = Set::Toolkit->new();
  $set->insert('a', {b=>2}, 4);

  ### Prints something like: "Set::Toolkit(HASH(0x9301880) 4 a)"
  print "$set";

When should this module be used?

You might want to use this module if the following are generally true:

  • You aren't desparate for speed.

  • You want to be able to search (and subsearch!) your sets easily.

  • You want ordered sets.

When shouldn't this module be used?

This module probably isn't right for you if you:

  • Need it fast, fast, fast!

  • You don't care about searching your sets.

  • You don't care about ordering your sets.

In these are true, I would take a look at Set::Object instead.

NOTES

Set::Toolkit sets contain "things" or "members" or "elements". I've avoided saying "objects" because you can really store anything in these sets, from scalars, to objects, to references.

Set::Toolkit does not currently support "weak" sets as defined by Set::Object.

Because uniqueness is not enforced by keying into a hash, scalars are not flattened into strings and will not lose their magicks.

SPECIAL DISCLAIMER

This is the first module I've released. I'm open to constructive critiques, bug reports, patches, doc patches, requests for documentation clarification, and so forth. Be gentle =)

AUTHOR

Sir Robert Burbridge, <sirrobert at gmail.com>

BUGS

Please report any bugs or feature requests to bug-set-toolkit at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Set::Toolkit. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

TODO

  • There are some gaps in the tests. I've tested for common use cases, but they could certainly be more robust.

  • More inline code comments.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Set::Toolkit

ACKNOWLEDGEMENTS

Thanks to Jean-Louis Leroy and Sam Vilain, the developers/maintainers of Set::Object, for lots of concepts, etc. I'm not actually using any borrowed code under the hood, but I plan to in the future.

COPYRIGHT & LICENSE

Copyright 2010 Sir Robert Burbridge, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.