Process very large (>20GB) text file line by line

后端 未结 11 1738
慢半拍i
慢半拍i 2020-11-29 17:54

I have a number of very large text files which I need to process, the largest being about 60GB.

Each line has 54 characters in seven fields and I want to remove the

11条回答
  •  暖寄归人
    2020-11-29 18:10

    Measure! You got quite some useful hints how to improve your python code and I agree with them. But you should first figure out, what your real problem is. My first steps to find your bottleneck would be:

    • Remove any processing from your code. Just read and write the data and measure the speed. If just reading and writing the files is too slow, it's not a problem of your code.
    • If just reading and writing is already slow, try to use multiple disks. You are reading and writing at the same time. On the same disc? If yes, try to use different discs and try again.
    • Some async io library (Twisted?) might help too.

    If you figured out the exact problem, ask again for optimizations of that problem.

提交回复
热议问题