Split large files using python

前端 未结 5 1114
有刺的猬
有刺的猬 2020-12-28 21:25

I have some trouble trying to split large files (say, around 10GB). The basic idea is simply read the lines, and group every, say 40000 lines into one file. But there are tw

5条回答
  •  不思量自难忘°
    2020-12-28 21:55

    Obviously, as you are doing work on the file, you will need to iterate over the file's contents in some way -- whether you do that manually or you let a part of the Python API do it for you (e.g. the readlines() method) is not important. In big O analysis, this means you will spend O(n) time (n being the size of the file).

    But reading the file into memory requires O(n) space also. Although sometimes we do need to read a 10 gb file into memory, your particular problem does not require this. We can iterate over the file object directly. Of course, the file object does require space, but we have no reason to hold the contents of the file twice in two different forms.

    Therefore, I would go with your second solution.

提交回复
热议问题