The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

RDF::Flow::Source - Source of RDF data

VERSION

version 0.178

SYNOPSIS

    $src = rdflow( "mydata.ttl", name => "RDF file as source" );
    $src = rdflow( "mydirectory", name => "directory with RDF files as source" );
    $src = rdflow( \&mysource, name => "code reference as source" );
    $src = rdflow( $model, name => "RDF::Trine::Model as source" );

    package MySource;
    use parent 'RDF::Flow::Source';

    sub retrieve_rdf {
        my ($self, $env) = @_;
        my $uri = $env->{'rdflow.uri'};

        # ... your logic here ...

        return $model;
    }

DESCRIPTION

Each RDF::Flow::Source provides a retrieve method, which returns RDF data on request. RDF data is always returned as instance of RDF::Trine::Model or as instance of RDF::Trine::Iterator with simple statements. The request format is specified below. Sources can access RDF for instance parsed from a file or multiple files in a directory, via HTTP, from a RDF::Trine::Store, or from a custom method. All sources share a set of common configurations options.

METHODS

new ( $from {, %configuration } )

Create a new RDF source by wrapping a code reference, a RDF::Trine::Model, or loading RDF data from a file or URL.

If you pass an existing RDF::Flow::Source object, it will not be wrapped.

A source returns RDF data as instance of RDF::Trine::Model or RDF::Trine::Iterator when queried by a PSGI requests. This is similar to PSGI applications, which return HTTP responses instead of RDF data. RDF::Light supports three types of sources: code references, instances of RDF::Flow, and instances of RDF::Trine::Model.

This constructor is exported as function rdflow by RDF::Flow:

  use RDF::Flow qw(rdflow);

  $src = rdflow( @args );               # short form
  $src = RDF:Source->new( @args );      # explicit constructor

init

Called from the constructor. Can be used in your sources.

retrieve

Retrieve RDF data. Always returns an instance of RDF::Trine::Model or RDF::Trine::Iterator. You can use the method "empty_rdf" to check whether the RDF data contains some triples or not.

retrieve_rdf

Internal method to retrieve RDF data. You should define this when subclassing RDF::Flow::Source, it is called by method retrieve.

trigger_retrieved ( $source, $result [, $message ] )

Creates a logging event at trace level to log that some result has been retrieved from a source. Returns the result. By default the logging messages is constructed from the source's name and the result's size. This function is automatically called at the end of method retrieve, so you do not have to call it, if your source only implements the method retrieve_rdf.

name

Returns the name of the source.

about

Returns a string with short information (name and size) of the source.

size

Returns the number of inputs (for multi-part sources, such as RDF::Flow::Source::Union).

inputs

Returns a list of inputs (unstable).

id

Returns a unique id of the source, based on its memory address.

pipe_to

Pipes the source to another source (RDF::Flow::Pipeline). $a->pipe_to($b) is equivalent to RDF::Flow::Pipeline->new($a,$b).

timestamp

Returns an ISO 8601 timestamp and possibly sets in rdflow.timestamp environment variable.

trigger_error

Triggers an error and possibly sets the rdflow.error environment variable.

graphviz

Purely experimental method for visualizing nets of sources.

graphviz_addnode

Purely experimental method for visualizing nets of sources.

CONFIGURATION

name

Name of the source. Defaults to "anonymous source".

from

Filename, URL, directory, RDF::Trine::Model or code reference to retrieve RDF from. This option is not supported by all source types.

match

Optional regular expression or code reference to match and/or map request URIs. For instance you can rewrite URNs to HTTP URIs like this:

    match => sub { $_[0] =~ s/^urn:isbn:/http://example.org/isbn/; }

The URI in rdflow.uri is set back to its original value after retrieval.

REQUEST FORMAT

A valid request can either by an URI (as byte string) or a hash reference, that is called an environment. The environment must be a specific subset of a PSGI environment with the following variables:

rdflow.uri

A request URI as byte string. If this variable is provided, no other variables are needed and the following variables will not modify this value.

psgi.url_scheme

A string http (assumed if not set) or https.

HTTP_HOST

The base URL of the host for constructing an URI. This or SERVER_NAME is required unless rdflow.uri is set.

SERVER_NAME

Name of the host for construction an URI. Only used if HTTP_HOST is not set.

SERVER_PORT

Port of the host for constructing an URI. By default 80 is used, but not kept as part of an HTTP-URI due to URI normalization.

SCRIPT_NAME

Path for constructing an URI. Must start with / if given.

QUERY_STRING

Portion of the request URI that follows the ?, if any.

rdflow.ignorepath

If this variable is set, no query part is used when constructing an URI.

The method reuses code from Plack::Request by Tatsuhiko Miyagawa. Note that the environment variable REQUEST_URI is not included. When this method constructs a request URI from a given environment hash, it always sets the variable rdflow.uri, so it is always guaranteed to be set after calling. However it may be the empty string, if an environment without HTTP_HOST or SERVER_NAME was provided.

FUNCTIONS

The following functions are defined to be used in custom source types.

rdflow_uri ( $env | $uri )

Prepares and returns a request URI, as given by an evironment hash or by an existing URI. Sets rdflow.uri if an environment has been given. URI construction is based on code from Plack, as described in the "REQUEST FORMAT". The following environment variables are used: psgi.url_scheme, HTTP_HOST or SERVER_NAME, SERVER_PORT, SCRIPT_NAME, PATH_INFO, QUERY_STRING, and rdflow.ignorepath.

sourcelist_args ( @_ )

Parses a list of inputs (code or other references) mixed with key-value pairs and returns both separated in an array and and hash.

iterator_to_model ( [ $iterator ] [, $model ] )

Adds all statements from a RDF::Trine::Iterator to a (possibly new) RDF::Trine::Model model and returns the model.

empty_rdf ( $rdf )

Returns true if the argument is an empty RDF::Trine::Model, an empty RDF::Trine::Iterator, or no RDF data at all.

AUTHOR

Jakob Voß <voss@gbv.de>

COPYRIGHT AND LICENSE

This software is copyright (c) 2011 by Jakob Voß.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.