The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

DFA

Introduction

A DFA is a collection of IDS::Algorithm::DFAStates. A DFA consists of the following:

states

A reference to a list of states. Functions that return information about this variable: print, printvcg. Functions that indirectly return information about this variable: test_instance, stats, verify. Functions that manipulate this variable: load, add_transition, collapse_states.

tokens

A reference to a hash. The index of the hash is the token, the value is the state that that token will cause a transition into.

Functions that manipulate this variable: rebuild_tokens, add_transition. Used by: verify, test_instance.

verbose

The verbosity level; the higher the value, the more messages are produced.

TODO: define the verbosity levels.

start

A reference to the start state.

accept

A reference to the accept state.

states[0] is the start state. states[1] is the accept state. However, relying on this is a bad idea; use the references to these states.

Methods

new()
new(filehandle)

Create a new DFA with two states, the start and accept states. If the filehandle is supplied, load the DFA (in the format used by print()).

load(filehandle)

Load a DFA from a file; this is the inverse of "print", and the format we expect is that used in $self->print.

stats()

Return general statistics about the DFA.

prune(threshold) =item prune(node_threshold, edge_threshold)

Delete nodes and edges used fewer than threshold times. If only one threshold is provided, it is used for both.

rebuild_tokens()

We deleted one or more states. Rebuild our list of what tokens go where.

reset_counters()

Reset all of the counters associated with node and edge use.

add(token_listref)

The collection of tokens (in the list referenced by token_listref) is a complete example of a list that should be accepted by the DFA.

WE add the transition from the last token to the '(ACCEPT)' state.

How learning occurs:

add_transition(from, token)

Add a transition from one state to another when the specified token is received. It is not an error to try to add an existing transition. In that event, this function quietly returns. If no such transition exists, we look for a transition on the token; if so, we add an edge to the destination node for the existing edge. Finally, if there is no other choice, we create a new state and add the edge.

print
print(filehandle)

Print in a form both usable by humans as well as for reading back in with the load subroutine. If the filehandle is specified, print there; otherwise, print to STDOUT.

printvcg
printvcg(filehandle)

Print in a form usable by VCG for printing the DFA.

If the filehandle is specified, print there; otherwise, print to STDOUT.

clean(type)

Clean the DFA.

Clean by collapsing states. We will do this until we get no changes in the size of the DFA. This cleaning is appropriate for any time.

If type eq "test", we collapse, then prune. Pruning only makes sense if training has already occurred and we are testing.

If type eq "train" or is undefined, we only collapse states.

generalize()

Generalization is simply cleaning as if we were training. This function exists to fit the IDS::Test framework.

collapse_states()

Look for identical transitions and collapse them into one state.

TODO: More description here

test(tokensref)

Test a list of tokens against the DFA by attempting to traverse the DFA from the start to the (single) accept node. If no transition exists for a token, this event is counted. The verifier then attempts to resynchronize by looking at the next token in the stream. If the edge corresponding to this token exists in the DFA, then the verifier will traverse this edge (even though it was not in a state with this outbound edge). If no such edge exists, another miss has occurred, and the verifier tries again with the next token. The similarity value returned is:

Note: WE put in the '(ACCEPT)' token at the end of the list we are passed.

Q: do we want to update the counters only if the test indicates normality? If so, it looks like we're commiting to two runs, one to determine normality and the second to update the counters. For now, we will update on the fly, but will hold this idea in reserve.

verbose(level)

Set the verbosity level. Higher values imply more output. Values over 2 are unlikely to be useful.

verify

Verify the consistency of the DFA; this function exists to aid in debugging, when it may be that a bug is causing it to be corrupted. Consider this a ``fsck'' for a DFA.

Functions required by IDS::Algorithm

default_parameters()

Sets all of the default values for the parameters. Normally called by new() or one of its descendents.

param_options()

Command-line option specifiers for our parameters for GetOpt::Long.

Functions for the auto id of where generalization is necessary.

generalization_needed

Looks through the DFA for high out-degrees with a low usage count. The high out degree says that we found a high variability here. The low usage count says that rarely were tokens duplicated.

This is not a DFA walk, but a scan through the nodes.

AUTHOR INFORMATION

Copyright 2005-2007, Kenneth Ingham. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Address bug reports and comments to: ids_test at i-pi.com. When sending bug reports, please provide the versions of IDS::Test.pm, IDS::Algorithm.pm, IDS::DataSource.pm, the version of Perl, and the name and version of the operating system you are using. Since Kenneth is a PhD student, the speed of the reponse depends on how the research is proceeding.

BUGS

Please report them.

SEE ALSO

IDS::Test, IDS::DataSource, IDS::Algorithm