Memory-efficent way to iterate over part of a large file

雨燕双飞 提交于 2019-11-30 21:17:23

If I understand your question correctly, the problem you're encountering is that storing all the lines of text in a list and then taking a slice uses too much memory. What you want is to read the file line-by-line, while ignoring all but a certain set of lines (say, lines [17,34) for example).

Try using enumerate to keep track of which line number you're on as you iterate through the file. Here is a generator-based approach which uses yield to output the interesting lines only one at a time:

def read_only_lines(f, start, finish):
    for ii,line in enumerate(f):
        if ii>=start and ii<finish:
            yield line
        elif ii>=finish:
            return

f = open("big text file.txt", "r")
for line in read_only_lines(f, 17, 34):
    print line

This read_only_lines function basically reimplements itertools.islice from the standard library, so you could use that to make an even more compact implementation:

from itertools import islice
for line in islice(f, 17, 34):
    print line

If you want to capture the lines of interest in a list rather than a generator, just cast them with a list:

from itertools import islice
lines_of_interest = list( islice(f, 17, 34) )

do_something_awesome( lines_of_interest )
do_something_else( lines_of_interest )
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!