utf-16 file seeking in python. how?

匿名 (未验证) 提交于 2019-12-03 02:02:01

问题:

For some reason i can not seek my utf16 file. It produces 'UnicodeException: UTF-16 stream does not start with BOM'. My code:

f = codecs.open(ai_file, 'r', 'utf-16') seek = self.ai_map[self._cbClass.Text]  #seek is valid int f.seek(seek) while True:     ln = f.readline().strip() 

I tried random stuff like first reading something from stream, didnt help. I checked offset that is seeked to using hex editor - string starts at character, not null byte (i guess its good sign, right?) So how to seek utf-16 in python?

回答1:

Well, the error message is telling you why: it's not reading a byte order mark. The byte order mark is at the beginning of the file. Without having read the byte order mark, the UTF-16 decoder can't know what order the bytes are in. Apparently it does this lazily, the first time you read, instead of when you open the file -- or else it is assuming that the seek() is starting a new UTF-16 stream.

If your file doesn't have a BOM, that's definitely the problem and you should specify the byte order when opening the file (see #2 below). Otherwise, I see two potential solutions:

  1. Read the first two bytes of the file to get the BOM before you seek. You seem to say this didn't work, indicating that perhaps it's expecting a fresh UTF-16 stream after the seek, so:

  2. Specify the byte order explicitly by using utf-16-le or utf-16-be as the encoding when you open the file.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!