Searching for a string in a large text file - profiling various methods in python

后端 未结 6 552
傲寒
傲寒 2020-12-02 05:54

This question has been asked many times. After spending some time reading the answers, I did some quick profiling to try out the various methods mentioned previously...

6条回答
  •  死守一世寂寞
    2020-12-02 06:27

    Variant 1 is great if you need to launch many sequential searches. Since set is internally a hash table, it's rather good at search. It takes time to build, though, and only works well if your data fit into RAM.

    Variant 3 is good for very big files, because you have plenty of address space to map them and OS caches enough data. You do a full scan; it can become rather slow once your data stop to fit into RAM.

    SQLite is definitely a nice idea if you need several searches in row and you can't fit the data into RAM. Load your strings into a table, build an index, and SQLite builds a nice b-tree for you. The tree can fit into RAM even if data don't (it's a bit like what @alienhard proposed), and even if it doesn't, the amount if I/O needed is dramatically lower. Of course, you need to create a disk-based SQLite database. I doubt that memory-based SQLite will beat Variant 1 significantly.

提交回复
热议问题