I noticed that if I iterate over a file that I opened, it is much faster to iterate over it without \"read\"-ing it.
i.e.
l = open(\'file\',\'r\')
The short answer to your question is that each of these three methods of reading bits of a file have different use cases. As noted above, f.read() reads the file as an individual string, and so allows relatively easy file-wide manipulations, such as a file-wide regex search or substitution.
f.readline() reads a single line of the file, allowing the user to parse a single line without necessarily reading the entire file. Using f.readline() also allows easier application of logic in reading the file than a complete line by line iteration, such as when a file changes format partway through.
Using the syntax for line in f:
allows the user to iterate over the file line by line as noted in the question.
(As noted in the other answer, this documentation is a very good read):
https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects
EDIT: It was previously claimed that readline() could be used to skip a line during a for loop iteration. However, this doesn't work in python 2.7, and is perhaps a questionable practice, so this claim has been removed.
EDIT: Added an example of a use case of f.readline() and f.read()
Note that readline()
is not comparable to the case of reading all lines in for-loop since it reads line by line and there is an overhead which is pointed out by others already.
I ran timeit
on two identical snippts but one with for-loop and the other with readlines()
. You can see my snippet below:
def test_read_file_1():
f = open('ml/README.md', 'r')
for line in f.readlines():
print(line)
def test_read_file_2():
f = open('ml/README.md', 'r')
for line in f:
print(line)
def test_time_read_file():
from timeit import timeit
duration_1 = timeit(lambda: test_read_file_1(), number=1000000)
duration_2 = timeit(lambda: test_read_file_2(), number=1000000)
print('duration using readlines():', duration_1)
print('duration using for-loop:', duration_2)
And the results:
duration using readlines(): 78.826229238
duration using for-loop: 69.487692794
The bottomline, I would say, for-loop is faster but in case of possibility of both, I'd rather readlines()
.
Eesssketit
That was a brilliant answer. / Something good to know is that wheneever you use the readline() function it reads a line..... and then it won't be able to read it again. You can return to the position by using the seek()
function. to go back to the zero position simply type in f.seek(0)
.
Similiarly, the function f.tell()
will let you know at which position you are.
Hope this helps!
https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects
When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory
Sorry for all the edits!
For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:
for line in f:
print line,
This is the first line of the file.
Second line of the file