The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

TITLE

Synopsis 7: Iterators and Laziness

AUTHORS

    Tim Nelson <wayland@wayland.id.au>
    Daniel Ruoso <daniel@ruoso.com>

VERSION

    Created: 27 Nov 2008

    Last Modified: 2 Feb 2010
    Version: 10

Laziness and Eagerness

As we all know, one of the primary virtues of the Perl programmer is laziness. This is also one of the virtues of Perl itself. However, Perl 6 knows better than to succumb to false laziness, and so is eager sometimes, and lazy others.

One thing that Perl understands is the difference between Laziness and Eagerness. When something is Lazy, it says "just give me what you've got; I'll get the rest later", whereas when it's eager, it says "More! More! Give me everything you can get!".

Perl 6 defines 4 levels of laziness:

Strictly Lazy

Does not evaluate anything unless explicitly required by the user, including not traversing non-lazy objects. This behavior is generally available only by pragma or by explicit programming with non-lazy primitives.

Mostly Lazy

Try to obtain available items without causing eager evaluation of other lazy objects. However, the implementation is allowed to do batching for efficiency. The programmer must not rely on circular side effects when the implementation is working ahead like this, or the results will be indeterminate. (However, this does not rule out pure definitional circularity: if a list is bound to the name of an array, earlier array values may be used in the definition of subsequent values.)

Mostly Eager

Obtain all "easy" items, defined as all leading items that are (easily) provable to represent a finite number of values (even if the exact number is not known in advance). The remainder of the items are left for lazy evaluation. For constructs that are not known either to be finite or infinite, the implementation is allowed, but not required, to work ahead some number of elements in a batching mode. If such a batch proves to terminate, the list may continue to with eager evaluation of subsequent elements. As with the "mostly lazy" case, the programmer must not depend on circular side effects in any situation where the implementation is allowed to work ahead. Note that there are no language-defined limits on the amount of conjectural evaluation allowed, up to the size of available memory; however, an implementation may allow the arbitrary limitation of workahead by use of pragams.

I/O from a file may generally be assumed to be finite, but I/O from a socket may not. However, since the definition of "finite" is somewhat dependent on available memory, a file that is too big may be reasonably be treated lazily. In any case, there should be no profound semantic differences between "mostly lazy" and "mostly eager". They are primarily just hints to the implementation as to whether you are likely to want to use all the values in the list. Nothing transactional should be allowed to depend on the difference.

Strictly Eager

Obtain all items, failing immediately with an appropriate message if asked to evaluate any data structures known to be infinite. (Data structures that are effectively infinite but not provably so will likely exhaust memory instead.) This behavior is generally available only by pragma or by explicit programming primitives.

It's important to realize that the responsibility of determining the level of laziness/eagerness in each operation is external to each lazy object; the runtime, depending on which operation is being performed, is going to assume an appropriate level of laziness and perform the needed operations to apply that level.

The laziness level of some common operations

List Assignment: my @a = @something;

In order to provide p5-like behavior in list assignment, this operation is performed in the Mostly Eager level, meaning that if you do

  my @a = grep { ... }, @b;

the grep will be evaluated as long as @b is not infinite.

  my @a = grep { ... }, 1, 2, 3, 4..*

will give grep an infinite list (even if the first elements are known), therefore it will also be lazy. On the other hand

  my @a = grep { ... }, 1, 2, 3, 4;

will be eagerly evaluated.

Despite the fact that "mostly eager" treats some iterators as potentially infinite, a list assignment to multiple targets forces evaluation (by demand) of enough of the list to fill any scalar destinations before a final "slurpy" destination. Hence:

    ($a, $b, @c) = 1..*;

takes the lazy iterator returned by the right side and forces it to divulge a 1 and a 2 before setting up @c to mean 3..* lazily.

Feed operators: my @a <== @something;

The feed operator is strictly lazy, meaning that no operation should be performed before the user requests any element. That's how

  my @a <== grep { ... } <== map { ... } <== grep { ... } <== 1, 2, 3

is completely lazy, even if 1,2,3 is a fairly small known compact list.

But it's important to notice that eagerness takes precedence over laziness, meaning that

  my @a = grep { ... } <== map { ... } <== grep { ... } <== 1, 2, 3

Will be eagerly evaluated, but that is still different from

  my @d = 1,2,3;
  my @c = grep { ... }, @d;
  my @b = map { ... }, @c;
  my @a = grep { ... }, @b;

Because in the first, the processing would be made as a flow, avoiding the creation of the intermediary eager lists that the second example creates. On the other hand

  my @d <== 1,2,3;
  my @c <== grep { ... }, @d;
  my @b <== map { ... }, @c;
  my @a = grep { ... }, @b;

provides the same laziness level of the first example.

The List Role

The List role represents the lazy access to a list, walking through:

Data structure (list, tree, table, etc)
Feed (map, grep, etc)
Stream (mostly for IO)
In fact, any value that wants to behave like a list

It's important to realize that the iterator of a list is the list itself; Perl does not distinguish the two concepts. Any type that supports the List role is an iterator.

The methods in this role are:

method get {...}

Used to iterate over a flat list (the context supplied by a slurpy binding). Returns the next element from the iterator after flattening any leading parcel.

When the iterator runs out of elements, it will fail with the EMPTY exception. As with most failures, EMPTY is generally returned without throwing it, unless the calling lexical scope requests it to be thrown via "use fatal". No list should ever contain the EMPTY exception, since iterational control flow should always terminate when that value is returned.

method batch ($max?) {...}

Used to iterate over a flat list when you wish to return an arbitrary number of elements at a time. The returned set of elements is guaranteed not to contain any Parcel objects, since any such Parcel will be flattened before returning its elements.

Returns the batch of elements in some appropriate Positional type that numerifies to the exact number of items returned. (The type may also be Iterable in its turn.) Also returns EMPTY if no arguments are available; the EMPTY exception should return 0 when numerified to facilitate numeric end testing if that is desired. If $max is not supplied, the iterator may choose a suitable batch size.

This method provides list context to the iterator.

method getarg {...}

Returns the next parcel or other object from the iterator without any flattening. It is used both for binding to positional parameters in item context as well as variadically retrieving slice elements in slice context.

When it runs out of objects, it will fail with the EMPTY exception. As with most failures, it is generally returned without throwing it, unless the calling lexical scope requests it to be thrown via "use fatal". No list should ever contain the EMPTY exception, since iterational control flow should always terminate when that value is returned.

Note: get and getarg must be atomic for any iterator shared by more than one thread, since it is likely that message passing is implemented in terms of them.

method batcharg ($max?) {...}

Returns a batch of parcels/objects in some appropriate Positional type that numerifies to the exact number of items returned. (The type may also be Iterable in its turn.) Also returns EMPTY if no arguments are available; the EMPTY exception should return 0 when numerified to facilitate numeric end testing if that is desired. If $max is not supplied, the iterator may choose a suitable batch size.

method flat {...}

This returns an iterator that always flattens by calling .get internally (which discards any parcel structure, returning only the parcel's elements). The returned iterator will always return the same value regardless of whether you call .get or .getarg.

method slice {...}

This returns an iterator that always anti-flattens by calling .getarg internally, then hiding any resulting parcel by turning it into a Seq before returning it externally. A list of Parcel is thereby transformed into a list of Seq. The returned iterator will always return the same value regardless of whether you call .get or .getarg.

The List::PushBack Role

This role defines an iterator that knows how to receive values back to be consumed again as if they were never consumed. The iterator is free to refuse values that were not consumed first and in the correct order, since this role is not intended to modify the original data, only to modify the traversal of that data.

method pushback($value) {}

This pushes $value back to the array. If this iterator is proxying immutable data, it is free to refuse a value that is either out of the original order or if it wasn't consumed. In that case, it should fail with UnorderedPushBackException or BadPushBackException.

The List::Unshift Role

This role defines an iterator that can receive new elements in the top of the stack, it works like pushback, but it should be able to store objects that are completely new to this stream. Either by being backed by a mutable list or by providing a local stack. This role implies List::PushBack, where pushback will be handled by unshift.

The purpose of defining two different Roles is to give the user the "expect to fail" or "expect to succeed" semantics.

method unshift($value) {}

This unshifts $value either to the backed list or to a local stack of the iterator.

Auxiliary Implementations

Perl's built-ins require that a number of auxiliary types implement Lists. These are available for general use, and are instantiated by ending a feed at a scalar, array, or sliced array.

Generic Item Iterator

To create a generic item iterator, end a feed in a scalar:

  my $it <== @a;

You can later do:

  my $item = $it.get;
  my @items = @$it;

(XXX TODO: effect of type constraints of $it -- maybe to control flattening? )

(XXX TODO:

   What's this do?

   my $it <== @a; my $it2 <== $it; my $it3 <== $it; get $it2; get $it3;

   ...allow tag team suckling?

   Is it worth the dwim, or do we just forbid it?

)

Generic Lazy List

To obtain a generic lazy list, end a feed in a Positional.

    my @it <== 1..2000;

The generic lazy list consumes the feed on demand. As the elements of the list are accessed, values that were in each subsequently returned Capture are placed at the end of the Positional. If a Capture with more than one value is returned, each value is stored at its own index. Empty Captures have no effect, and so the information that they ever existed at all is lost.

You can later do things like:

    @c = @it[1..5];
    $s = @it[4];

...and this may run the iterator to find any values which are only lazily present. Values which have already been retrieved from the iterator are kept available unless they are removed, so if you are working with a large sequence you may want to:

    shift(@it);

Reading and writing indices in a generic lazy list iterator will force consumption and caching of values for all indices up to and including the accessed one.

    @it[4] = 2;  # @it[0..3] are cached now, @it[4] was consumed/destroyed

Adding to the beginning of the list just stores values without touching the iterators.

    @it.unshift(<a b c>);  # works as expected

Trying to find the number of elements in any way may force eager consumption of the remainder of the values in the iterator, and trying to access the end of the list certainly will.

    @it.pop; # Eagerness happens.  Maybe bad things, too.

...but you had better be very sure that the list is finite. Attempts to fiddle with the end of an infinite list may result in the interpreter raising quite the exception, in the luckiest cases it may just deliver an Inf, but in some cases you may find yourself spending a very, very long time doing laps around the yard. The exact behavior may be implementation specific.

So don't do that.

If the list is known to be finite, the generic array iterator will revert to a normal array once the feed is exhausted. So it is legal to:

    @it.push; # eagerly exhausts the feed, then turns into to a normal array

...with the same caveats about infinity.

(XXX TODO: effect of type constraints/shape of @it? )

(XXX TODO:

   my @a = (1,2); @a <<== ... promotes an occupied Positional to
   iterator, right? Should probably be explicitly mentioned.
)

Generic Lazy Slice

The generic lazy slice consumes the Parcels from an iterator but stores the results as a bi-dimensional list, where the first dimension corresponds to an iteration, and the second contains the values in the Parcel returned for that iteration, but turned into a Seq object. Empty Parcel objects are turned into empty Seq objects. Any other object is returned as itself.

To obtain a generic lazy slice, end a feed in a sliced binding.

    my **@it <== map { ... }, 1,2,3;

Note that in:

    my **@it <== (1,mysub(),2; 1..*);

the slice lists are independently lazy. However, the call to mysub() is not particularly lazy; it occurs when the parcel is constructed. Any laziness is in the function's return value, not in the call. A call in the top level of any parcel is called eagerly. To call a function lazily it is necessary to embed the call in some other explicitly lazy operator:

    1, (() ... { START mysub() }), 2  # harder way
    1, lazy { mysub() }, 2            # easier way

[Note: the above text really belongs somewhere else, but I'm too lazy to figure out where.]

Coroutines

Perl6 does not have a formally defined sub-style coroutine. Doubtless there will be external modules to do so in different flavors. Such a construct, where many calls made to the name of a sub with different parameters, expecting to reach the same sub every time, is not really compatible with multi-method-dispatch. While it may suit some programming styles which only use a subset of the Perl6 language, it cannot be made generally applicable across the Perl6 feature set.

This is not to say you cannot do coroutines in Perl6. In fact, the gather/take construct is a simple coroutine. But if you want to pass values back and forth Lua-style, you have to use a suplimentary object:

     sub my_coro (*@slurp) {
        gather do {
            my Int $initval;
            my Str $secondval;
            my Int $thirdval;
            $initval = shift(@slurp);
            $initval++;
            take $initval;
            $secondval = shift(@slurp);
            take "$initval $secondval";
            $thirdval = shift(@slurp);
            take $thirdval + $initval;
        }
    }
    my @a = (1);
    my $it;
    $it <== my_coro() <== while 1 { shift(@a) };
    say "First result: " ~ get $it;
    @a.push("Hello World");
    say "Second result: " ~ get $it;
    @a.push(500);
    say "Third result: " ~ get $it;

...if you want to pass multiple parameters on each call, you can use a slice slurpy instead, to pass a Capture.

The iterator and array can of course be bundled up to give a more natural feel:

    class my_sub2coro {
        has $!it = Failure;
        has @!data = ();
        has $.coro;
        submethod BUILD {
            $!it <== &($.coro)() <== while 1 { shift(@!data) };
        }
        method corocall($message) {
            @!data.push($message);
            get $!it;
        }
    }
    my $c = my_sub2coro.new(coro => &my_coro);
    say "First result" ~ $c.corocall(1);
    say "Second result" ~ $c.corocall("Hello World");
    say "Third result" ~ $c.corocall(500);

(XXX TODO: As feeds are not implemented yet, above example code not tested)

Additions

Please post errors and feedback to perl6-language. If you are making a general laundry list, please separate messages by topic.