Reading lines beyond SUB in Python [duplicate]

问题

Newbie question. In Python 2.7.2., I have a problem reading text files which accidentally seem to contain some control characters. Specifically, the loop

for line in f

will cease without any warning or error as soon as it comes across a line containing the SUB character (ascii hex code 1a). When using f.readlines() the result is the same. Essentially, as far as Python is concerned, the file is finished as soon as the first SUB character is encountered, and the last value assigned line is the line up to that character.

Is there a way to read beyond such a character and/or to issue a warning when encountering one?

回答1:

On Windows systems 0x1a is the End-of-File character. You'll need to open the file in binary mode in order to get past it:

f = open(filename, 'rb')

The downside is you will lose the line-oriented nature and have to split the lines yourself:

lines = f.read().split('\r\n')  # assuming Windows line endings

回答2:

Try opening the file in binary mode:

f = open(filename, 'rb')

来源：https://stackoverflow.com/questions/9520592/reading-lines-beyond-sub-in-python

标签

python

ascii

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!