JSON::SL::Tuba - High performance SAX-like interface for JSON
Create a very naive JSON encoder using JSON::SL::Tuba
my $JSON ||= <<'EOJ'; { "a" : "b", "c" : { "d" : "e" }, "f" : [ "g", "h", "i", "j" ], "a number" : 0.4444444444, "a (false) boolean": false, "another (true) boolean" : true, "a null value" : null, "exponential" : 1.3413400E4, "an\tescaped key" : "a u-\u0065\u0073caped value", "שלום":"להראות" } EOJ # Split the 'stream' into multiple chunks to demonstrate the streaming # feature: my @Chunks = unpack("a(8)*", $JSON); # Make a subclass and set up the methods.. package A::Giant::Tuba; use base qw(JSON::SL::Tuba); sub on_any { my ($tuba,$info,$data) = @_; #use constant comparisons if ($info->{Type} == TUBA_TYPE_JSON) { printf STDERR ("JSON DOCUMENT: %c\n\n", $info->{Mode}); return; } # or use the mnemonic ones if ($info->{Key} && $info->{Mode} =~ m,[>\+],) { printf ('"%s" : ', $info->{Key}); } if ($info->{Type} == TUBA_TYPE_STRING) { printf('"%s",' . "\n", $data || "<NO DATA>"); } elsif ($info->{Type} =~ m,[\[\{],) { if ($info->{Mode} eq '+') { print $info->{Type} . "\n"; } else { print $JSON::SL::Tuba::CloseTokens{$info->{Type}} . ",\n"; } } else { if (defined $data) { print $data . ",\n" } else { die ("hrrm.. what have we here?") unless $info->{Type} == TUBA_TYPE_NULL; print "null,\n"; } } } my $o = My::Giant::Tuba->new(); $o->parse($_) for @Chunks;
Output:
JSON DOCUMENT: + { "a" : b, "c" : { "d" : e, }, "f" : [ g, h, i, j, ], "a number" : 0.4444444444, "a (false) boolean" : 0, "another (true) boolean" : 1, "a null value" : null "exponential" : 13413.4, "an escaped key" : a u-escaped value, "שלום" : להראות, }, JSON DOCUMENT: -
JSON::SL::Tuba provides an event-based and high performance SAX-like interface for parsing streaming JSON.
JSON::SL::Tuba
Emphasis when designing JSON::SL::Tuba was the reduction of boilerplate (the author does not have favorable experiences with SAX APIs) and high performance.
This uses the same core JSON functionality and speed as JSON::SL.
To use JSON::SL::Tuba, simply inherit from it and define one or more methods to be called when a parse event occurs.
In normal cases (and this is the default), only a single method (see below) needs to be implemented to be able to receive events.
Of course, if your application requirements are more complex, Tuba is able to deliver you events to the resolution of a single character.
These are the list of methods which to implement. All methods follow a single unified calling convention in the form of
callback($tuba, $info, $data);
where $tuba is the JSON::SL::Tuba instance, $info is a hash reference containing metadata about the item for which the event was received, and $data contains the actual 'data' (if applicable)
$tuba
$info
$data
This hash contains metadata for determining relevant information about the current item.
The hash and all its contents are read-only. Their contents are not valid after the callback returns (see "CAVEATS"). This is for both performance and sanity reasons.
Its keys and values are as follows
Type
This is the type of JSON object for which an event was received.
The following table represents a table of type constants, and their mnemonic symbols. The value to this key itself is a double-typed scalar which yields either the character or the numeric value depending on the context.
Constant Mnemonic Symbol Description === Scalar Types === TUBA_TYPE_STRING " "string" value TUBA_TYPE_KEY # hash key TUBA_TYPE_BOOLEAN ? JSON boolean atom ('true','false') TUBA_TYPE_NUMBER = number TUBA_TYPE_NULL ~ JSON 'null' atom === Container Types === TUBA_TYPE_OBJECT { hash (JSON 'object') TUBA_TYPE_LIST [ array (JSON 'list') === Pseudo Types === TUBA_TYPE_JSON D the entire stream TUBA_TYPE_SPECIAL ^ non-string scalar TUBA_TYPE_DATA c any scalar data
This is the 'mode' of the callback. The mode is also a magical mnemomic constant similar to the type.
I use the term element to mean any kind of JSON variable/object - i.e. anything listed in the above type table.
Constant Mnemonic Symbol Description TUBA_MODE_START + the start of an element TUBA_MODE_END - completion of an element TUBA_MODE_ON > data (contents) of an element
By default, the behavior is as follows:
Complex type events (new hash, new list) are delivered as START events. When they complete, END events are provided
START
END
For Scalar types, the START and END callbacks are not delivered, but their contents internally accumulated and delivered in whole via a single ON callback.
ON
Almost every aspect of this is entirely configurable, and these are just (what I hope) sane defaults.
By default, keys are not delivered as their own events, but rather attached to this field for the values which succeed them.
This field, if present, will contain the JSON key.
Only valid if the parent object is a hash.
See the accum_kv option below for a way to make keys be delivered as their own events.
accum_kv
Like key, but instead of a string key, this is a numeric index. Indexes are never delivered as explicit events (since they are inherently implicit entities).
Only valid if parent object is a list.
This is a boolean flag. Set to true if the current string needs escaping. This is never set unless string events are delivered incrementally.
Nothing much to say here. This is the pure 'data' associated with the callback.
For ON-style callbacks, this will contain a complete string/number/key (the default), or fragment thereof.
By default, strings are unescaped and numeric formats converted to their Perl equivalent when they cannot be easily stringified.
Complex (non-scalar) objects will never receive an ON-style callback.
START and STOP callbacks never have any data, either.
STOP
If you've read the above section, then the names of the callbacks to be delivered are relatively consistent.
This is the default and catch-all callback for all events. The subsequent callbacks in the list do not offer any more capability than this method, but are merely present for performance and convenience (the dispatching for those methods is done in pure C, rather than several layers of Perl).
Therefore, the semantics and behavior of on_any depends on the functionality of the method for which on_any has been made a surrogate.
on_any
Determining this can be quite easy. Simply combine the Type and Mode fields to yield the equivalent function name:
Mode
if ($info->{Type} == TUBA_TYPE_LIST and $info->{Mode} eq '+') { my $callback_name = "start_list"; } # etc.
Delivered on the beginning or end of a stream.
Delivered on the beginning and end of a hash
Delivered on the beginning or end of an array.
Delivered when a string has started or stopped. More specifically, this means when the lexer has seen an opening or closing "
"
This is where string-specific data gets delivered. This can be either an entire string, or a fragment thereof. In the case of the former, the string is unescaped.
This is an optional (and default) generic callback for incremental mode - fragments of numbers, booleans, strings, and keys will be delivered here, with the START and STOP callbacks signalling their beginning and end.
These three methods follow the same semantics as their *_string equivalents, except of course, there is no unescaping
*_string
Same behavior as strings and numbers, except that the object (in the default accumulator mode) is converted to a JSON::SL::Boolean
JSON::SL::Boolean
Delivered for JSON null atoms. In accumulator mode, these get converted into undef values.
null
undef
By default JSON::SL::Tuba uses internal accumulators to buffer your data. This makes for high level events being delivered efficiently without having to call into perl with multiple callbacks for very small units of data. This also makes it easier for you the user, as state handling mechanisms do not need to be as complex.
In addition, Tuba has a special kv (key-value) accumulator which buffers hash keys internally and only ever delivers them as the Key field within the informational hash passed to callbacks.
kv
Key
Accumulator settings control whether incremental 'data' callbacks will be invoked for a specific scalar type or not.
Set accumulator parameters. Each type argument is one of the TUBA_TYPE_ constants (or a mnemonic character), and each boolean argument is whether data for that type should be accumulated.
type
TUBA_TYPE_
boolean
Gets or sets the status of the key-value accumulator. Note that enabling the key-value accumulator will also enable the generic key (i.e. #) but disabling the key-value accumulator will not reverse this effect.
#
This enables or disables the accumulator settings for all scalar types (but not the key-value accumulator)
If only a single callback is being used, set this option to have Tuba call the on_any callback initially instead of using this as a fallback.
This is not enabled by default as it prevents any other methods from being called, but should be turned on if you don't care about that fact.
Tell Tuba to set the SvUTF8 flag on strings.
SvUTF8
By default, Tuba will croak if it cannot find a handler method for a given event (this effectively means the on_any method has not been implemented). This is usually what you want. To disable this behavior, set allow_unhandled to a true value.
allow_unhandled
There is one method:
And that's all there is to it. Tuba will parse all data fed to it.
If accumulator mode is not being used, then you will be guaranteed to rhave processed every bit of data in $json_chunk, leaving nothing buffered.
$json_chunk
This method will croak on error (and I have not yet implemented error handling).
The tuba object is a simple hash references. Feel free to use it and abuse it. One exception is the _TUBA key which contains the pointer to the internal C structure. You will probably have perl croak for trying to modify this read-only variable - but if perl doesn't croak, your program will crash - so don't modify it.
_TUBA
It would be nice to provide an error handler.
The info hash passed to callbacks is read only and volatile. This means the following:
Trying to access a non-existent key in the hash (i.e. any key not listed in the section describing this hash) will throw an error about accessing a disallowed key.
Trying to modify any value in the hash will throw and error.
Keeping references to values within the hash, e.g.
my $ref = \$hash->{Type};
will not work as the value will not be consistent after the callback has returned.
It is safe to take a reference to the Key field, though.
Considering what Tuba does and the convenience it provides, it's blazingly fast. Nevertheless, JSON::SL is still at least twice the speed.
JSON::SL
Copyright (C) 2012 M. Nunberg
You may use and distribute this software under the same terms and conditions as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
Non-ASCII character seen before =encoding in '"שלום":"להראות"'. Assuming UTF-8
To install JSON::SL::Boolean, copy and paste the appropriate command in to your terminal.
cpanm
cpanm JSON::SL::Boolean
CPAN shell
perl -MCPAN -e shell install JSON::SL::Boolean
For more information on module installation, please visit the detailed CPAN module installation guide.