I have a number of very large text files which I need to process, the largest being about 60GB.
Each line has 54 characters in seven fields and I want to remove the
As you don't seem to be limited by CPU, but rather by I/O, have you tried with some variations on the third parameter of open
?
Indeed, this third parameter can be used to give the buffer size to be used for file operations!
Simply writing open( "filepath", "r", 16777216 )
will use 16 MB buffers when reading from the file. It must help.
Use the same for the output file, and measure/compare with identical file for the rest.
Note: This is the same kind of optimization suggested by other, but you can gain it here for free, without changing your code, without having to buffer yourself.