UnicodeDecodeError: 'charmap' codec can't encode character X at position Y: character maps to undefined

主宰稳场 提交于 2019-12-25 06:43:36

问题


To CLARIFY: this question is not a duplicate of this one, I have already tried all the hints there and didn't get the answer.

I have a txt file with unicode data in, and am want to open the file as an string.

I tried

a=open('myfile.txt', 'r', encoding='utf-8') 
print a.read()

but there is an error saying:

UnicodeDecodeError: 'charmap' codec can't encode character '\ufeff' at position Y: character maps to undefined

Now my question is, I don't care about my UTF-8 characters at all, is there anyway to put an exception that whenever python is encountering utf-8 character just remove it or pass it? Also to clarify, I have tried the encoding with, utf-8, utf-8-sig, utf-16 and etc.

I tried this as well but no luck.

a=open('myfile.txt', 'r', encoding='utf-8') 
try:
    print a.read()
except:
    pass

I also tried importing codecs and the code below:

a=codecs.open('myfile.txt', 'r', encoding='utf-8') 
print a.read()

but still same error is popping out.


回答1:


Correcting my answer for encoding in print statement: Avoid printing to stdout Windows, because Python assumes that CMD terminal can only handle Windows-1252 (MS copy of ISO of latin-1). This is easily sidestepped by always printing to stderr instead:

import sys
print('your text', file=sys.stderr)

On Linux there should be no issue with printing Unicode correctly.

P.S.: for Python 2.x:

from __future__ import print_function
import sys
print('your text', file=sys.stderr)

P.P.S.: Original answer: For python 3.x:

a=open('myfile.txt', 'r', encoding='utf-8', errors='ignore') 

See https://docs.python.org/3/library/codecs.html#error-handlers for a detailed list of your options



来源:https://stackoverflow.com/questions/33444740/unicodedecodeerror-charmap-codec-cant-encode-character-x-at-position-y-char

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!