Forcing UTF-8 over cp1252 (Python3)

馋奶兔 提交于 2019-12-10 16:31:48

问题


I've written some code that makes use of the Biopython Entrez wrapper. Code was working fine on my previous Win10 laptop (Python 3.5.1), but I've just ported the code to a new Win10 laptop with the same versions of every package and Python installed and I'm now getting a decode error.

The traceback error leads to a function that fetches text - it's attempting to decode the text using cp1252 when it should be using UTF-8. I know that similar questions have been asked, but none have dealt with this problem happening inside a package (Biopython in my case). Copying the UTF-8 encoding file in Python/lib and renaming it to cp1252.py solves the problem, but this obviously is not a long term solution.

File "C:\Users\arjun\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 21715: character maps to <undefined>

回答1:


Use the io module for reading if you're using Python 3.x (https://docs.python.org/2/library/io.html#io.open). By default, it will use the encoding specified on its running platform. You can also specify your own encoding as explained in the docs.



来源:https://stackoverflow.com/questions/38833533/forcing-utf-8-over-cp1252-python3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!