The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

IDS::DFAState - A state in a Deterministic Finite Automata (DFA) or a Hidden Markov Model (HMM).

SYNOPSIS

A usage synopsis would go here. Since it is not here, read on.

DESCRIPTION

Introduction

This class is for people writing various forms of finite automata. It is unlikely to be useful to others.

Note that a state is rarely accessed other than through a reference. A token is always a simple string.

A state consists of the following:

inbound

A hash with key of a reference to a state and a value of a reference to a hash indexed by tokens that that that cause a transition to us (the value of this hash is just "1"; we use a hash because it is a set and not a list). If the hash is empty, we will be pruned shortly.

Class methods that provide information relating to this variable:

"in_links"

How many states have transitions to us

"in_states"

What states have transitions to us

"in_tokens"

what are the tokens causing an inbound transition

"exists_inbound_from"

If a state is in the list

Functions that change this variable:

"add_inbound"

Add an inbound state

"absorb"
"dropped_edge"
outbound

A hash, indexed by tokens causing the transition, of references to states that we can transition to. We can have only one outbound transition per token, so this is a DFA and not a NFA state. Functions that provide information relating to this variable:

"out_links"

How many outbound links do we have

"out_states"

States we can reach

"out_tokens"

Tokens causing transitions

"token_to"

What token causes a transition to a specified state

"next"

What state we transition to for a given token.

Functions that change this variable:

"add_outbound"
"absorb"
"move"
"drop_edge"
out_count

Out_count keeps track of every time an edge is traversed. When used in a DFA, the counts may be used to know which edges are not used, and thus are candidates for pruning. When used in a HMM, out_count keeps track of every time an edge is traversed for computing probabilities later. This variable is a hash, indexed by the token causing the transition, with the value being the count.

Functions that provide information relating to this variable:

"out_count"

the count for a given token

Functions that change this variable:

"followed"
"reset_counters"

As well as all of the functions that manipulate the variable outbound.

visits

When the DFA is being used, visits keep track of the number of times this node has been visited. It is used in pruning to delete un-used nodes. Functions that provide information relating to this variable:

"visits"

Functions that change this variable:

"visited"
"reset_counters"

Sanity says that the sum of the out counts should equal this count.

verbose

As the state does operations, it will print messages that might be helpful for debugging. These are controlled by the verbosity level. The higher the value, the more verbose. Values beyond 2 are unlikely to be useful.

Methods

The only callers of these methods should be methods in DFA, HMM, or related classes.

Construction involves not just attaching one state to another, but also the merging of states.

new()
new(verbosity)

Creates a new IDS::DFAState. The verbosity level defaults to 0 if it is not supplied.

add_outbound(token, to)

During construction, add an outbound transition for the specified token to the specified state.

add_inbound(inref, intoken)

Private method. Add a reference to a state that transitions to us.

absorb(otherstate)

Absorb the transitions from another (similar) state. Any tokens that our state does not have are added. The resulting counter is the sum of the current and other states.

All tokens in common must go to the same place or an error results.

move(from, to)

Move outbound transitions from the "from" state to the "to" state. This function will be called during a state merge, to reset the transitions to the new state.

The counter is not changed since all that occurs is a move; the count can still be considered valid.

I/O methods

Print the edges associated with this state. If a filehandle is supplied, print there, STDOUT otherwise.

The node_map is a mapping from node (IDS::DFAState) reference to the node number assigned to a node by the DFA or HMM.

Print the outbound transitions in VCG format. See "SEE ALSO" for a reference for the VCG format.

The node_map is a mapping from a node (IDS::DFAState) reference to the node number assigned to a node by the DFA or HMM.

Print information about this node (state) in VCG format. The node_number is our node number.

DFA adaptive learning (from the point of view of a IDS::DFAState) is keeping track of nodes and edges as they are used, and then dropping the edges which are not used enough to warrant keeping them.

The counters are also used by a HMM for calculating probabilities.

visited()

Increment the visited counter for the node.

visits()

Return the value of the visited counter.

followed(token)

Increment the counter for an edge associated with a token. Note that the edge must exist.

out_count(token)

Return the counter associated with the transition for the specified token.

set_count(token, value)

Set the counter associated with the transition for the specified token to the given value.

reset_counters()

Reset the node and edge counters to 0.

probability(token)

Return the probability of taking the edge assocaited with the specified token. The probability is calculated as out_count(token) / sum of out_count(all tokens)

drop_edges(threshold)

Drop all edges with a use count below the threshold.

drop_edge(token)

Internal use only. Drop the edge associated with the specified token.

dropped_edge(fromref)

Internal use only. An edge to us from the state referenced by fromref was dropped. Update the inbound array to remove the given node that used to come to us.

drop_all_edges()

This node is about to be dropped; drop all of our edges. This function does not work if loops exist in the automaton. A solution is to let the edges drop off from non-use and then use a function that has yet to be written to drop nodes with no in or out edges.

The state is about to be dropped. Remove all links to it.

Note that the dropping of an edge will not cause a node to be deleted. Only the pruning will so that. This means that we will keep dead-end and unreachable nodes until the pruning drops them from lack of use. The various drop functions should only ensure the consistency of both sides of the link they are operating on.

Methods for information about this state

in_states()

Return a list or reference to an array (depending on if we are called in scalar or list context) which is a list of inbound states.

in_links()

Return the number of inbound links we have.

out_tokens()

Return a list of tokens that cause a transition out of this state.

out_states()

Return a list of states to which this state has transitions.

out_links()

Return the number of outbound edges.

in_tokens()

Return a list of tokens that cause a transition into this state.

token_to(state)

Return the token that causes a transition to the specified state.

token_from(state)

Return the token that causes a transition from the specified state, or undef if the state claims to have no transition to us.

next(token)

Given a token, return the next state.

exists_inbound_from(from)

Verify that ``from'' is in the list of inbound states.

compare(otherstate)

Compare the current state with another state. The return value is 0 if they are identical, 1 otherwise. This return value may seem odd, but it was inspired by the perl cmp and <=> operators. However, the concept of greater than and less than is not well defined.

Two states are considered identical iff:

  • They have the same number of outbound states.

  • Every token for one state has a transition in the other state with the same destination.

Note that the inbound states may be different for the compared states and they will still test as identical. This is on purpose to allow the merging to occur.

AUTHOR INFORMATION

Copyright 2005-2007, Kenneth Ingham. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Address bug reports and comments to: ids_test at i-pi.com. When sending bug reports, please provide the versions of IDS::Test.pm, IDS::Algorithm.pm, IDS::DataSource.pm, the version of Perl, and the name and version of the operating system you are using. Since Kenneth is a PhD student, the speed of the response depends on how the research is proceeding.

BUGS

Please report them.

SEE ALSO

IDS::Algorithm, DFA, HMM

VCG - Visualization of Compiler Graphs, Design Report and User Documentation, Ref. Compare, USAAR-1049-visual, January 1994, updated January 1995