Counting the number of substrings

吃可爱长大的小学妹 提交于 2020-01-02 04:00:29

问题


I would like to ask if there is an algorithm for counting the number of discrete occurencies of a substring in a string in O(n) time.


回答1:


[EDIT 17/11/2013: Count the leaf nodes. Thanks Vinicius Pinto!]

You can build a suffix tree on the text in linear time. Then, search for your substring in the suffix tree; when you find it, count the number of leaf nodes beneath the matching node. This is O(m + k) for a substring of length m that appears k times (add an n term for building the suffix tree). Or, you can precalculate the number of descendants for each node in the tree using a depth-first traversal -- this will give O(m) queries.

For large texts, suffix arrays are often faster than suffix trees in practice, despite searching for a single length-m string dropping from O(m) to O(m log m). In this case, all occurrences of a particular substring will appear as a contiguous block in the suffix array, so the width of this block is the number of occurrences.




回答2:


You can use the KMP Algorithm and modify it to increment a counter instead of returning.

Another possibility is the Rabin-Karp algorithm, however this relies on hashing, so you either have to accept the possibility of false positives while keeping the complexity linear, or accept the possibility of quadratic complexity while keeping the results 100% correct.



来源:https://stackoverflow.com/questions/6402109/counting-the-number-of-substrings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!