问题
I need to read a big file by reading at most N lines at a time, until EOF. What is the most effective way of doing it in Python? Something like:
with open(filename, 'r') as infile:
while not EOF:
lines = [get next N lines]
process(lines)
回答1:
One solution would be a list comprehension and the slice operator:
with open(filename, 'r') as infile:
lines = [line for line in infile][:N]
After this lines is tuple of lines. However, this would load the complete file into memory. If you don't want this (i.e. if the file could be really large) there is another solution using a generator expression and islice from the itertools package:
from itertools import islice
with open(filename, 'r') as infile:
lines_gen = islice(infile, N)
lines_gen is a generator object, that gives you each line of the file and can be used in a loop like this:
for line in lines_gen:
print line
Both solutions give you up to N lines (or fewer, if the file doesn't have that much).
回答2:
A file object is an iterator over lines in Python. To iterate over the file N lines at a time, you could use grouper() itertools' recipe (see What is the most “pythonic” way to iterate over a list in chunks?):
#!/usr/bin/env python2
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return izip_longest(*args, fillvalue=fillvalue)
Example
with open(filename) as f:
for lines in grouper(f, N, ''):
assert len(lines) == N
# process N lines here
回答3:
This code will work with any count of lines in file and any N. If you have 1100 lines in file and N = 200, you will get 5 times to process chunks of 200 lines and one time with 100 lines.
with open(filename, 'r') as infile:
lines = []
for line in infile:
lines.append(line)
if len(lines) >= N:
process(lines)
lines = []
if len(lines) > 0:
process(lines)
回答4:
maybe:
for x in range(N):
lines.append(f.readline())
回答5:
I think you should be using chunks instead of specifying the number of lines to read. It makes your code more robust and generic. Even if the lines are big, using chunk will upload only the assigned amount of data into memory.
Refer to this link
回答6:
How about a for loop?
with open(filename, 'r') as infile:
while not EOF:
lines = []
for i in range(next N lines):
lines.append(infile.readline())
process(lines)
回答7:
You may have to do something as simple as:
lines = [infile.readline() for _ in range(N)]
Update after comments:
lines = [line for line in [infile.readline() for _ in range(N)] if len(line) ]
回答8:
If you can read the full file in ahead of time;
infile = open(filename, 'r').readlines()
my_block = [line.strip() for line in infile[:N]]
cur_pos = 0
while my_block:
print (my_block)
cur_pos +=1
my_block = [line.strip() for line in infile[cur_pos*N:(cur_pos +1)*N]]
回答9:
I needed to read in n lines at a time from files for extremely large files (~1TB) and wrote a simple package to do this. If you pip install bigread, you can do:
from bigread import Reader
stream = Reader(file='large.txt', block_size=10)
for i in stream:
print(i)
block_size is the number of lines to read at a time.
回答10:
I was looking for an answer to the same question, but did not really like any of the proposed stuff earlier, so I ended up writing this slightly ugly thing that does exactly what I wanted without using strange libraries.
def test(filename, N):
with open(filename, 'r') as infile:
lines = []
for line in infile:
line = line.strip()
if len(lines) < N-1:
lines.append(line)
else:
lines.append(line)
res = lines
lines = []
yield res
else:
if len(lines) != 0:
yield lines
来源:https://stackoverflow.com/questions/5832856/how-to-read-file-n-lines-at-a-time-in-python