Searching for a string in a large text file - profiling various methods in python

后端 未结 6 561
傲寒
傲寒 2020-12-02 05:54

This question has been asked many times. After spending some time reading the answers, I did some quick profiling to try out the various methods mentioned previously...

6条回答
  •  甜味超标
    2020-12-02 06:38

    I would guess many of the paths start out the same on DMOZ. You should use a trie data structure and store the individual characters on nodes.

    Tries have O(m) lookup time (where m is the key length) also save a lot of space, when saving large dictionaries or tree like data.

    You could also store path parts on nodes to reduce node count — this is called Patricia Trie. But that makes the lookup slower by the average string length comparison time. See SO question Trie (Prefix Tree) in Python for more info about implementations.

    There are a couple of trie implementations on Python Package Index, but they are not very good. I have written one in Ruby and in Common Lisp, which is especially well suited for this task – if you ask nicely, I could maybe publish it as open source... :-)

提交回复
热议问题