The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Statistics::Sequences::Pot - Helmut Schmidt's test of force-like runs of a discrete state within a numerical or categorical sequence

SYNOPSIS

 use strict;
 use Statistics::Sequences::Pot 0.12; # methods/args here are not compatible with earlier versions
 my $pot = Statistics::Sequences::Pot->new();
 $pot->load([qw/2 0 8 5 3 5 2 3 1 1 9 4 4 1 5 5 6 5 8 7 5 3 8 5 6/]); # strings/numbers; or send as "data => $aref" with each stat call
 my $val = $pot->observed(state => 5); # other methods include: expected(), variance(), obsdev() and stdev()
 $val = $pot->zscore(state => 5, tails => 2, ccorr => 1); # # or want an array & get back both z- and p-value
 $val = $pot->p_value(state => 5, tails => 1); # assuming data are loaded; alias: test()
 my $href = $pot->stats_hash(values => {observed => 1, p_value => 1}, state => 5); # include any other stat-method as needed
 $pot->dump(values => {observed => 1, expected => 1, p_value => 1}, state => 5, flag => 1, precision_s => 3, precision_p => 7);
 # prints: observed = 4.310, expected = 4.529, p_value = 0.8090600

DESCRIPTION

The Pot statistic measures the bunching relative to the spacing of a single state within a series of other states, conceived by Helmut Schmidt as a targeted "potential" energy (or Pot) that dissipates exponentially between states. It's not limited to considering only clusters of consecutive states (or bunches), as is the case with the more familiar Runs test of sequences.

Say you're interested in the occurrence of the state 3 within an array of digits: note how, in the following arrays, there are increasing breaks between the 3s (separated by 0, 1 and then 2 other states):

 4, 7, 3, 3
 3, 4, 3, 7
 3, 8, 1, 3

The occurrence of 3 is, with the Pot-test, of exponentially declining interest across these sequences, given the increasing breaks by other states between the occurrences of 3. The statistic does not ignore these ever remoter occurrences of the state of interest; it accounts for increased spacing between them as if there were an exponentially declining force, a potential towards 3, within the data-stream (up to a theoretical or empirical asymptote that may be specified).

Running the Pot-test involves testing its significance as a standard "z" score; Schmidt (2000) provided data demonstrating Pot's conformance with the normal distribution. This will naturally be improved by repeated sampling, and by using block averages.

METHODS

Methods are essentially as described in Statistics::Sequences

new

 $pot = Statistics::Sequences::Pot->new();

Returns a new Pot object. Expects/accepts no arguments but the classname.

load

 $pot->load(@data); # anonymously
 $pot->load(\@data);
 $pot->load('sample1' => \@data); # labelled whatever

Loads data anonymously or by name - see load in the Statistics::Data manpage for details on the various ways data can be loaded and then retrieved (more than shown here).

Data can be categorical or numerical, and multi-valued - i.e, unlike in tests of runs and joins, they do not have to be dichotomous.

add, read, unload

See Statistics::Data for these additional operations on data that have been loaded.

observed, pot_observed, pvo

 $v = $pot->observed(state => 'x'); # use the first data loaded anonymously; specify a 'state' within it to test its pot
 $v = $pot->observed(index => 1, state => 'x'); # ... or give the required "index" for the loaded data
 $v = $pot->observed(label => 'mysequence', state => 'x'); # ... or its "label" value
 $v = $pot->observed(data => [qw/x z x x p c/], state => 'x'); # ... or just give the data now

Returns observed value of pot, a measure of the number and size of bunchings of the state that occurred within the array. The data to calculate this on can already have been loaded, or you send it here as a flat referenced array keyed as data. The observed value of pot is based on Schmidt (2000), Equations 6-7, and his Appended program, viz.:

            I,J=1..N
             Σ  r|n(I) - n(J)|
            I<J

where

            r = eN/MS

is the number of observations, and S (for scale) is a constant determining the range r of the potential energy between pairs of I and J states. These values are set as name => value pairs, as follows.

state => string

The state within the data whose bunching is to be tested. This is the only required argument; croak if no state is specified. Returns 0 if this state does not exist in the data.

scale => numeric >= 1

Optionally, the scale of the range parameter, which should be greater than or equal to 1. Default = 1; values less than 1 are effected as 1.

In most situations, should all states be equiprobable, or their probability be proportionate to their number, r would reflect the average distance, or delay, between successive states, equal to the number of all observations divided by the number of states. For example, if there were 10 possible states, and 100 observations have been made, then the probability of re-occurrence of any one of the 10 states within any slot will be equal to 100/10, with S = 1, i.e., expecting that any one of the states would mostly occur by a spacing of 10, and then by an exponentially declining tendency toward consecutive occurrence. In this way, with S = 1, Pot is a measure of "short-range bunching," as Schmidt called it. Bunching over a larger range than this minimally expected range can be measured with S > 1. This is specified, optionally, as the argument named scale to test. Hypothesis-testing might be made with respect to various values of the scale parameter.

expected, pot_expected, pve

 $v = $pot->expected(state => 'x'); # or specify loaded data by "index" or "label", or give it as "data" - see observed()
 $v = $pot->expected(data => [qw/x z x x p c/], state => 'x'); # use these data

Returns the theoretically expected value of Pot, given N states among M observations, and r range of clustering within these observations. It is calculated as follows from Schmidt (2000) Eq. 8, given the above definitions.

              Pot = ((N(N – 1))/(M(M – 1))) . (r/1 – r) . (M – (1/(1 – r)))

variance, pot_variance, pvv

 $v = $pot->variance(state => 'x'); # or specify loaded data by "index" or "label", or give it as "data" - see observed()
 $v = $pot->variance(data => [qw/x z x x p c/], state => 'x'); # use these data

Returns the variance in the theoretically expected value of pot, given by Schmidt (2000) Equation 9a:

              Variance = (r²/ (1 – r²) . (N / M) . (1 – (N / M))² . N

obsdev, observed_deviation

 $v = $pot->obsdev(state => 'x'); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
 $v = $pot->obsdev(data => [qw/x z x x p c/], state => 'x'); # use these data

Returns the deviation of (difference between) observed and expected pot for the loaded/given sequence (O - E).

stdev, standard_deviation

 $v = $pot->stdev(state => 'x'); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
 $v = $pot->stdev(data => [qw/x z x x p c/], state => 'x');

Returns square-root of the variance.

z_value, zscore, pot_zscore, pzs

 $v = $pot->z_value(ccorr => 1, state => 'x'); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
 $v = $pot->z_value(data => $aref, ccorr => 1, state => 'x');
 ($zvalue, $pvalue) = $pot->z_value(data => $aref, ccorr => 1, tails => 2, state => 'x'); # same but wanting an array, get the p-value too

Returns the zscore from a test of pot deviation, taking the pot expected away from that observed and dividing by the root expected pot variance, by default with a continuity correction to expectation. Called wanting an array, returns the z-value with its p-value for the tails (1 or 2) given.

The data to test can already have been loaded, or sent directly as an aref keyed as data.

Other options are precision_s (for the z_value) and precision_p (for the p_value).

p_value, test, pot_test, ptt

 $p = $pot->p_value(state => 'x'); # using loaded data and default args
 $p = $pot->p_value(ccorr => 0|1, tails => 1|2, state => 'x'); # normal-approximation based on loaded data
 $p = $pot->p_value(data => $aref, ccorr => 1, tails => 2, state => 'x'); #  using given data (by-passing load and read)

Test the currently loaded data for significance of the vale of pot. Returns the zscore from test of pot deviation, or, called wanting an array, the z-value with its p-value for the tails (1 or 2) given.

stats_hash

 $href = $pot->stats_hash(values => {observed => 1, expected => 1, variance => 1, z_value => 1, p_value => 1}, prob => .5, ccorr => 1);

Returns a hashref for the counts and stats as specified by hashref in its "values" argument, and with any options for calculating them. See "stats_hash" in Statistics::Sequences for details.

dump

 $pot->dump(values => { observed => 1, variance => 1, p_value => 1}, exact => 1, flag => 1,  precision_s => 3); # among other options

Print Pot-test results to STDOUT. See dump in the Statistics::Sequences manpage for details.

EXAMPLE

Using Pot as a test of bunching of a particular state within a collection of quasi-random events.

 use strict;
 use Statistics::Sequences::Pot;
 my ($i, @data) = ();
 # Init random integers ranging from 0 to 15:
 for ($i = 0; $i < 960; $i++) { $data[$i] = int(rand(16)); }
 # Assess degree of bunching within these data with respect to a randomly selected target state:
 my $state = int(rand(16));
 my $pot = Statistics::Sequences::Pot->new();
 $pot->load(\@data);
 my %args = (state => $state, values => {p_value => 1, observed => 1, expected => 1, stdev => 1});
 my $statsref = $pot->stats_hash(\%args);
 # Access the results of this analysis:
 print "The probability of obtaining as much bunching of $state as observed is $statsref->{'p_value'}.\n";
 print "The observed value of Pot was $statsref->{'observed'}, with expected value $statsref->{'expected'} ($statsref->{'stdev'}).\n";
 # or print the lot, and more, in English:
 $pot->dump(%args, precision_s => 3, precision_p => 7,);

REFERENCES

Schmidt, H. (2000). A proposed measure for psi-induced bunching of randomly spaced events. Journal of Parapsychology, 64, 301-316.

SEE ALSO

http://www.fourmilab.ch/rpkp/ for Schmidt's many papers on the physical conceptualisation and properties of psi.

Statistics::Descriptive : The present module adds data to "Full" objects of this package in order to access descriptives re bunches and spaces.

Statistics::Frequency : the proportional_frequency() method in this module could be informative when working with data of the kind used here.

BUGS/LIMITATIONS

No computational bugs as yet identfied. Hopefully this will change, given time.

Limitations of the code, perhaps, concern the non-unique storage of data arrays (compared to, say, Statistics::DependantTTest, but see Statistics::TTest). This would require a unique name for each array of data, and explicit reference to one or another array with each test (when, perhaps, you'd have only one data-set, after all). In any case, the data are accepted as array references.

REVISION HISTORY

See CHANGES in installation dist for revisions.

AUTHOR/LICENSE

rgarton AT cpan DOT org

This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see http://www.perl.com/perl/misc/Artistic.html).

Disclaimer

To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.

END

This ends documentation of a Perl implementation of Helmut Schmidt's test of pot (potential energy) of occurrence of a state among others in a categorical sequence.