Algorithm to find multiple string matches

前端未结

关注

 6  970

佛祖请我去吃肉 2020-11-27 17:03

I\'m looking for suggestions for an efficient algorithm for finding all matches in a large body of text. Terms to search for will be contained in a list and can have 1000+ p

6条回答

-上瘾入骨i (楼主)

2020-11-27 17:30
An optimal solution for this problem is to use a suffix tree (or a suffix array). It's essentially a trie of all suffixes of a string. For a text of length O(N), this can be built in O(N).

Then all k occurrences of a string of length m can be answered optimally in O(m + k).

Suffix trees can also be used to efficiently find e.g. the longest palindrome, the longest common substring, the longest repeated substring, etc.

This is the typical data structure to use when analyzing DNA strings which can be millions/billions of bases long.

See also
- Wikipedia/Suffix tree
- Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology (Dan Gusfield).
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

Algorithm to find multiple string matches

See also