Using Aho-Corasick, can strings be added after the initial tree is built?

佐手、 提交于 2019-12-22 01:29:52

问题


I want to search for strings inside a large number of documents. I have a predefined list of strings available that I want to find in each document. Each document contains a header at the beginning followed by the text and in the header are additional strings I want to search for in the text below the header.

On each iteration of document, is it possible to add the header strings after creating the initial tree that was made from the main list? Or modify the original data structure to include the new strings?

If this is not practical to do, is there an alternative search method that would be more appropriate?


回答1:


If each document has its own set of strings to search for, it seems like you could just build one global Aho-Corasick matcher and then a second, per-document matcher. Then, as you process the characters in the document, feed each into both of the matching automata and report all matches found this way. That eliminates the need to add new strings to the master automaton and to remove them when you're done. Plus, the slowdown should be pretty minimal.

Hope this helps!



来源:https://stackoverflow.com/questions/28858872/using-aho-corasick-can-strings-be-added-after-the-initial-tree-is-built

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!