Process very large (>20GB) text file line by line

后端 未结 11 1746
慢半拍i
慢半拍i 2020-11-29 17:54

I have a number of very large text files which I need to process, the largest being about 60GB.

Each line has 54 characters in seven fields and I want to remove the

11条回答
  •  伪装坚强ぢ
    2020-11-29 18:15

    Your code is rather un-idiomatic and makes far more function calls than needed. A simpler version is:

    ProcessLargeTextFile():
        with open("filepath") as r, open("output") as w:
            for line in r:
                fields = line.split(' ')
                fields[0:2] = [fields[0][:-3], 
                               fields[1][:-3],
                               fields[2][:-3]]
                w.write(' '.join(fields))
    

    and I don't know of a modern filesystem that is slower than Windows. Since it appears you are using these huge data files as databases, have you considered using a real database?

    Finally, if you are just interested in reducing file size, have you considered compressing / zipping the files?

提交回复
热议问题