Process very large (>20GB) text file line by line

后端 未结 11 1734
慢半拍i
慢半拍i 2020-11-29 17:54

I have a number of very large text files which I need to process, the largest being about 60GB.

Each line has 54 characters in seven fields and I want to remove the

11条回答
  •  野趣味
    野趣味 (楼主)
    2020-11-29 18:16

    As you don't seem to be limited by CPU, but rather by I/O, have you tried with some variations on the third parameter of open?

    Indeed, this third parameter can be used to give the buffer size to be used for file operations!

    Simply writing open( "filepath", "r", 16777216 ) will use 16 MB buffers when reading from the file. It must help.

    Use the same for the output file, and measure/compare with identical file for the rest.

    Note: This is the same kind of optimization suggested by other, but you can gain it here for free, without changing your code, without having to buffer yourself.

提交回复
热议问题