I have a text file with first line of unicode characters and all other lines in ASCII. I try to read the first line as one variable, and all other lines as another. However,
Because you used .readline()
first, the codecs.open()
file has filled a linebuffer; the subsequent call to .readlines()
returns only the buffered lines.
If you call .readlines()
again, the rest of the lines are returned:
>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71
The work-around is to not mix .readline()
and .readlines()
:
f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ') # take the first line.
This behaviour is really a bug; the Python devs are aware of it, see issue 8260.
The other option is to use io.open() instead of codecs.open()
; the io
library is what Python 3 uses to implement the built-in open()
function and is a lot more robust and versatile than the codecs
module.