High performance mass short string search in Python

前端 未结 5 1637
[愿得一人]
[愿得一人] 2021-02-05 13:55

The Problem: A large static list of strings is provided as A, A long string is provided as B, strings in A are all very short (a keywords

5条回答
  •  刺人心
    刺人心 (楼主)
    2021-02-05 14:39

    Your problem is large enough that you probably need to hit it with the algorithm bat.

    Take a look into the Aho-Corasick Algorithm. Your problem statement is a paraphrase of the problem that this algorithm tackles.

    Also, look into the work by Nicholas Lehuen with his PyTST package.

    There are also references in a related Stack Overflow message that mention other algorithms such as Rabin-Karp: Algorithm for linear pattern matching?

提交回复
热议问题