Aho Corasick algorithm

后端 未结 2 743
北荒
北荒 2020-12-30 14:45

I am not able to understand the below algorithm which is used for string pattern matching using Aho-Corasick alg.

Procedure AC(y,n,q0)
INPUT: y<-array of          


        
相关标签:
2条回答
  • 2020-12-30 15:46

    You probably won't gain a good understanding of the Aho-Corasick algorithm from reading a little bit of pseudocode. Unless you understand the state transition table, the algorithm will make no sense at all.

    There's a decent explanation along with an animation at Aho-Corasick implementation and animation.

    The original paper, Efficient String Matching: An Aid to Bibliographic Search(PDF), is well written and understandable, and the pseudocode examples are pretty easy to convert to working code. It'll take a little study, but you should have a good understanding after you read the paper, think about it a bit, and then read it again.

    0 讨论(0)
  • 2020-12-30 15:46

    Aho Corasick algorithm is used to solve set matching problem. It means we have a set of strings S, and here comes a long string L to check whether L contains any one in the previous set S.

    An basic solution is using a trie tree, i.e. a prefix tree, please see Wikipedia. There are typically two steps to deal with the problem.

    1. Precompute the trie tree based on the string set S;
    2. Scan the string L to check.

    The trie tree is easy to understand. It store prefix substrings with nodes starts from the root.

    The Aho-Corasick algorithm is an extension of a trie tree, not far from the basic idea. Aho-Corasick algorithm adds a failed pointer to each node on a trie tree.

    When fails, a trie tree will restart from the root (add the start index on L by 1), but Aho-Corasick algorithm will restart from the node D pointed by the failed pointer (add the start index on L by the depth of the node D).

    Below is a C++ implementation of the Aho-Corasick algorithm. It contains some bugs. http://www.komodia.com/aho-corasick

    I fixed the bugs I found. And you can access my version here: https://github.com/elfinhe/KomodiaAhoCorasick

    0 讨论(0)
提交回复
热议问题