Speed up millions of regex replacements in Python 3

后端 未结 9 1300
醉酒成梦
醉酒成梦 2020-11-22 05:44

I\'m using Python 3.5.2

I have two lists

  • a list of about 750,000 \"sentences\" (long strings)
  • a list of about 20,000 \"words\" that I would l
9条回答
  •  孤街浪徒
    2020-11-22 06:28

    Concatenate all your sentences into one document. Use any implementation of the Aho-Corasick algorithm (here's one) to locate all your "bad" words. Traverse the file, replacing each bad word, updating the offsets of found words that follow etc.

提交回复
热议问题