JSON::SL - Fast, Streaming, and Searchable JSON decoder.
use JSON::SL; use Data::Dumper; my $txt = <<'EOT'; { "some" : { "partial" : 42.42 }, "other" : { "partial" : "a string" }, "complex" : { "partial": { "a key" : "a value" } }, "more" : { "more" : "stuff" EOT my $json = JSON::SL->new(); my $jpath = "/^/partial"; $json->set_jsonpointer( [$jpath] ); my @results = $json->feed($txt); foreach my $result (@results) { printf("== Got result (path %s) ==\n", $result->{Path}); printf("Query was %s\n", $result->{JSONPointer}); my $value = $result->{Value}; if (!ref $value) { printf("Got scalar value %s\n", $value); } else { printf("Got reference:\n"); print Dumper($value); } print "\n"; }
Produces:
== Got result (path /some/partial) == Query was /^/partial Got scalar value 42.42 == Got result (path /other/partial) == Query was /^/partial Got scalar value a string == Got result (path /complex/partial) == Query was /^/partial Got reference: $VAR1 = { 'a key' => 'a value' };
JSON::SL was designed from the ground up to be easily accessible and searchable for partially received streamining content.
It uses an embedded C library (jsonsl) to do the streaming and most of the dirty work.
jsonsl
JSON::SL allows you to use the JSONPointer URI/path syntax to tell it about certain objects and elements which are of interest to you. JSON::SL will then incrementally parse the input stream, returning those selected objects to you as soon as they arrive.
In addition, the objects are returned with extra context information, which is itself another JSONPointer path specifying the path from the root of the JSON stream until the current object.
Since I hate SAX's callback interface, and since almost all the boilerplate for a SAX interface needs to be done for just about every usage case, I have decided to move over the core work of state stacking and such to the C library itself. This means minimal boilerplate and ultra fast performance on your part.
Creates a new JSON::SL object
JSON::SL
If $max_levels is provided, then it is taken as the maximum recursion depth the parser will be able to descend. This can only be set during construction time as it affects the amount of memory allocated for the internal structures.
$max_levels
The amount of memory allocated for each structure is around 64 bytes on 64-bit (i.e. sizeof (char*) == 8) systems and around 48 bytes on 32 bit (i.e. sizeof (char*) == 4) systems.
sizeof (char*) == 8
sizeof (char*) == 4
The default is 512, or a total of 32KB allocated
Set the JSONPointer query paths for this object. Note this can only be done once per the object's lifetime, and only before you have started calling the "feed" method.
The JSONPointer notation is quite simple, and follows URI scheme conventions. Each / represents a level of descent into an object, and each path component represents a hash key or array index (whether something is indeed a key or an index is derived from the context of the JSON stream itself, in case you were wondering).
/
http://tools.ietf.org/html/draft-pbryan-zyp-json-pointer-02 Contains the draft for the JSONPointer specification.
As an extension to the specification, JSON::SL allows you to use the ^ (caret) character as a wildcard. Placing the lone ^ in any path component means to match any value in the current level, effectively providing glob-style semantics.
^
This is the meat and potatoes of JSON::SL. Call it with $input being a JSON input stream, with likely partial data.
$input
The module will do its magic and decode elements for you according to the queries set in "set_jsonpointer".
If called in scalar context, returns one matching item from the partial stream. If called in list context, returns all remaining matching items. If called in void context, the JSON is still decoded, but nothing is returned.
The return value is one or a list of (depending on the context) hash references with the following keys
This is the actual value selected by the query. This can be a string, number, hash reference, array reference, undef, or a JSON::SL::Boolean object.
JSON::SL::Boolean
This is a JSONPointer path, which can be used to get context information (and perhaps be able to locate 'neighbors' in the object graph using "root").
The original matching query path used to select this object. Can be used to associate this object with some extra user-defined context.
N.B. incr_parse is an alias to this method, for familiarity.
incr_parse
Returns remaining decoded JSON objects. Returns the same kinds of things that "feed" does (with the same semantics dependent on scalar and list context), except that it does not accept any arguments. This is helpful for a usage pattern as such:
$sl->feed($large_json); while (my ($res = $sl->fetch)) { # do something with the result object.. }
Resets the state. Any cached objects, result queues, and such are deleted and freed. Note that the JSONPointer query will still remain (and is static for the duration of the JSON::SL instance).
One of JSON::SL's features is the ability to get a perl-representation of incomplete JSON data. As soon as a JSON element can be converted to some kind of shell which resembles a Perl object, it is inserted into the object graph, or object tree
This returns the partial object graph formed from the JSON stream. In other words, this is the object tree.
Items whihc have been selected to be filtered via "set_jsonpointer" are not present in this object graph, and neither are incomplete strings.
It is an error to modify anything in the object returned by root, and Perl will croak if you try so with an 'attempted modification of read-only value' error. (but see "make_referrent_writeable" for a way to override this)
Nevertheless it is useful to get a glimpse of the 'rest' of the JSON document not returned via the feed method
Returns true if the object pointed to by $ref has the SvREADONLY flag off. In other words, if the flag is off then it is safe to modify its contents.
$ref
SvREADONLY
Convenience methods to make the perl variable referred to by $ref read-only or writeable.
make_referrent_writeable will make the object pointed to by $ref as writeable, and make_referrent_readonly will make the object pointed to by $ref as readonly.
make_referrent_writeable
make_referrent_readonly
You may 'poll' to see when an object has become writeable by doing the following
1) Locate your initial object in the object graph using my $v = $sl->root() 2) Check its initial status by using $sl->referrent_is_writeable($v) 3) Stash the reference somewhere, and repeat step 2 as necessary.
Using the make_referrent_writeable you may modify the object graph as needed. Modification of the object graph is not always safe and performing disallowed modifications can make your application crash (which is why incomplete objects are marked as read-only in the first place).
In the event where you need to make modifications to the object graph, following these guidelines will prevent an application crash:
These are always safe to modify (and will never be read-only) because they are only inserted into the object graph once they have completed.
Deleting hash keys which point to placeholders (represented as undef) will change the hash key for the real value, once that value is completed.
undef
Removing an array element or hash value which is 1) a container (hash or array), and 2) was read-only will crash your application. Perl will destroy the container when it goes out of scope from your function. However, JSON::SL will continue to reference it inside its internal structures, so do not do this.
Adding a hash value/key to the hash is permitted, but the value may become clobbered when and if an actual key-value pair is detected from the JSON input stream.
Prepending (i.e. unshifting) to an array is permitted. Appending (i.e. pushing) to an array is only safe if you are sure that none of the elements of the array are potential JSONPointer query matches. JSONPointer matches for array indices will internall pop the current (i.e. last) element of the array and return it from "feed".
unshift
push
Get or set the current status of the SvUTF8 flag as it is applied to the strings returned by JSON::SL. If set to true, then input and output will be assumed to be encoded in utf8
SvUTF8
Get/Set whether the JSONPointer field is populated in the hash returned by "feed". Turning this on (i.e. leaving out the JSONPointer field) may gain some performance
JSONPointer
Get/Set whether path information (the Path field) is populated in the hash returned by "feed". Turning this on (i.e. leaving out path information) may boost performance, but will also leave you in the dark in regards to where/what your object is.
Path
This functions exactly like JSON::XS's method of the same name. To quote:
Set the maximum length a JSON text may have (in bytes) where decoding is being attempted. The default is C<0>, meaning no limit. When C<decode> is called on a string that is longer then this many bytes, it will not attempt to decode the string but throw an exception. ... If no argument is given, the limit check will be deactivated (same as when C<0> is specified). See SECURITY CONSIDERATIONS in L<JSON::XS>, for more info on why this is useful.
As an alternative to using JSONPointer, you can use an 'object drip'. With this setting enabled, all hashes and arrays will be returned via feed or fetch in reverse order (i.e. the deepest objects are returned first, followed by their encapsulated objects).
feed
This allows you to inspect complete descendent objects as they arrive.
The objects returned by fetch and feed will still follow the same semantics, with context/path information stored inside the Path key. The JSONPointer field is obviously not passed since it is not being used.
fetch
Example:
use JSON::SL; use Test::More; my $sl = JSON::SL->new(); $sl->object_drip(1); # create an incomplete JSON object: my $json = <<'EOJ'; [ [ { "key1":"foo", "key2":"bar", "key3":"baz" } EOJ my @res = $sl->feed($json); my $expected = [ { Value => "foo", Path => '/0/0/key1', }, { Value => "bar", Path => '/0/0/key2', }, { Value => "baz", Path => '/0/0/key3' }, { Value => {}, Path => '/0/0' }, ]; is_deeply(\@res, $expected, "Got expected results for object drip...");
Outer encapsulating objects will have their children removed (as they have already been returned in previous results).
Only complete objects (i.e. objects which can no longer contain any more data) will be returned.
These functions are not object methods but rather exported functions. You may export them on demand or use their fully-qualified name
Decodes a JSON string and returns a Perl object. This really doesn't serve much use, and JSON::XS is faster than this. Nevertheless it eliminates the need to use two modules if all you want to do is decode JSON.
Unescapes a JSON string, translating \uXXXX and other compliant escapes to their actual character/byte representation. Returns the converted string, undef if the input was empty. Dies on invalid input.
\uXXXX
my $str = "\\u0041"; my $unescaped = unescape_json_string($str); # => "A"
Both "decode_json" and "feed" output already-unescaped strings, so there is no need to call this function on strings returned by those methods.
This will most likely not work with threads, although one would wonder why you would want to use this module across threads.
When inspecting the object tree, you may see some undef values, and it is impossible to determine whether those values are JSON nulls, or placeholder values. It would be possible to implement a class e.g. JSON::SL::Placeholder, but doing so would either be unsafe or incur additional overhead.
null
JSON::SL::Placeholder
The ^ caret is somewhat obscure as a wildcard character
Currently wildcard matching is all-or-nothing, meaning that constructs such as foo^ will not work.
foo^
All input to JSON::SL should be either UTF-8 or ASCII (a subset of UTF-8).
More specifically, the input stream must be any superset of ASCII which uses octet streams (so this includes Latin1).
Perl itself only natively deals with 8-bit ASCII, Latin1, or UTF8 - so if your input stream is something else (for example, UTF-16) it will need to be converted to UTF8 some point in time before it is passed to JSON::SL.
JSON::SL aims to be the fastest JSON decoded for Perl. Currently it is only in second place - being 25% slower than JSON::XS for decode_json and about 8% slower for incremental parsing.
decode_json
Additionally, if your input has lots of escapes (not very common in real-world JSON), JSON::SL will be even slower.
Nevertheless I believe that the benefits provided by JSON::SL save not only human time, but also machine time - What good is quickly decoding a large JSON stream if there are no proper facilities to inspect it?.
Work is in progress for a SAX-style interface. See JSON::SL::Tuba
JSON::XS - Still faster than this module, and is also the source of many of JSON::SL's ideas and tests.
If you wish to aid in the development of the JSON parser, do not modify the source files in the perl distribution, they are merely copied over from here:
jsonsl - C core for JSON::SL
JSON - JSON's main page
JSON Specification
JSONPointer Specification
JSON::SL::Tuba - Same core with an event-oriented interface, like SAX
Copyright (C) 2012 M. Nunberg
This module contains extracts from JSON::XS, nevertheless they are both licensed under the same terms as Perl itself.
To install JSON::SL::Boolean, copy and paste the appropriate command in to your terminal.
cpanm
cpanm JSON::SL::Boolean
CPAN shell
perl -MCPAN -e shell install JSON::SL::Boolean
For more information on module installation, please visit the detailed CPAN module installation guide.