Processing Large Files in Python [ 1000 GB or More]

后端 未结 8 982
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-15 06:27

Lets say i have a text file of 1000 GB. I need to find how much times a phrase occurs in the text.

Is there any faster way to do this that the one i am using bellow

8条回答
  •  半阙折子戏
    2020-12-15 07:02

    Going to suggest doing this with grep instead of python. Will be faster, and generally if you're dealing with 1000GB of text on your local machine you've done something wrong, but all judgements aside, grep comes with a couple of options that will make your life easier.

    grep -o '' bigfile.txt|wc -l
    

    Specifically this will count the number of lines in which your desired phrase appears. This should also count multiple occurrences on a single line.

    If you don't need that you could instead do something like this:

    grep -c '' bigfile.txt
    

提交回复
热议问题