State transition table for aho corasick algorithm

喜你入骨 提交于 2020-01-01 06:54:34

问题


Please help me to understand the construction of state transition table for more than one patterns in the Aho-Corasick algorithm.

Please give a simple and detailed explanation so that I could understand.

I am following this paper and here is the animation for that.

Thanks.


回答1:


Phase 1

Creating the keyword tree :

Starting at the root, follow the path labeled by chars of Pi
        If the path ends before Pi, continue it by adding new edges and ...
                            nodes for the remaining characters of P
        Store identifier i of Pi at the terminal node of the path

I demonstrate it by an example :

Suppose we have a finite set of patterns P = {he, she, his, hers}.

Phase2

Next we extend the keyword tree into an automaton,to support linear-time matching as follows.

Obviously the automaton consists of two parts :

  • states
  • transitions

States : Exactly the nodes achieved by the keyword tree; and initial state = root of the tree.

Transitions : using goto(q,a) function as follows.

/*This function gives the state entered from current state q 
     by matching target char a*/
- If edge (q,v) in the tree is labeled by a, then g(q,a)=v; 
- g(0,a)=0 for each a that does not label an edge out of the root
//the automton stays at the initial state while scanning non-matching characters       
- O.W. g(q,a)={}

Using the automaton :

SearchofTarget T[1..m] //considering the target string T as an 
                     array of chars indexed from 1   to m
q:= 0; // initial state(root)
for i:= 1 to m do
    while g(q,T[i]) == {}  do 
        q:= fail(q); // follow a fail
    q:= goto(q,T[i]); // follow a goto
    if output(q) != {} then print i, out(q);
endfor;

As you see in the above pseudo-code it calls two functions : fail(q) and output(q)

fail(q) : // for q !=0 gives the state entered at a mismatch

failure(q) is the node labeled by the longest proper suffix  w of L(q) ...
    s.t. w is a prefix of some pattern
//L(q) is the label of node q as the concatenation of edge labels ...
   on the path from the root to q

output(q) :

gives the set of patterns recognized when entering state q

After computation of these two functions, the automaton looks like this :

Now you may wonder when to compute these methods, so please look at these for more official form :

Hope this help and if still something is ambiguous please don't hesitate to ask.



来源:https://stackoverflow.com/questions/22398190/state-transition-table-for-aho-corasick-algorithm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!