UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 0: character maps to <undefined>

女生的网名这么多〃 提交于 2021-01-20 11:56:38

问题


I'm working on an application which is using utf-8 encoding. For debugging purposes I need to print the text. If I use print() directly with variable containing my unicode string, ex- print(pred_str).

I get this error:

UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 0: character maps to

So I tried print(pred_str.encode('utf-8')) and my output looks like this:

b'\xef\xbb\xbfpudgala-dharma-nair\xc4\x81tmyayo\xe1\xb8\xa5 apratipanna-vipratipann\xc4\x81n\xc4\x81m' b'avipar\xc4\xabta-pudgala-dharma-nair\xc4\x81tmya-pratip\xc4\x81dana-artham' b'tri\xe1\xb9\x83\xc5\x9bik\xc4\x81-vij\xc3\xb1apti-prakara\xe1\xb9\x87a-\xc4\x81rambha\xe1\xb8\xa5' b'pudgala-dharma-nair\xc4\x81tmya-pratip\xc4\x81danam punar kle\xc5\x9ba-j\xc3\xb1eya-\xc4\x81vara\xe1\xb9\x87a-prah\xc4\x81\xe1\xb9\x87a-artham'

But, I want my output to look like this:

pudgala-dharma-nairātmyayoḥ apratipanna-vipratipannānām aviparīta-pudgala-dharma-nairātmya-pratipādana-artham triṃśikā-vijñapti-prakaraṇa-ārambhaḥ pudgala-dharma-nairātmya-pratipādanam punar kleśa-jñeya-āvaraṇa-prahāṇa-artham

If i save my string in file using:

with codecs.open('out.txt', 'w', 'UTF-8') as f:
    f.write(pred_str)

it saves string as expected.


回答1:


Your data is encoded with the "UTF-8-SIG" codec, which is sometimes used in Microsoft environments.

This variant of UTF-8 prefixes encoded text with a byte order mark '\xef\xbb\xbf', to make it easier for applications to detect UTF-8 encoded text vs other encodings.

You can decode such bytestrings like this:

>>> bs = b'\xef\xbb\xbfpudgala-dharma-nair\xc4\x81tmyayo\xe1\xb8\xa5 apratipanna-vipratipann\xc4\x81n\xc4\x81m'
>>> text = bs.decode('utf-8-sig')
>>> print(text)                                                                                                         
pudgala-dharma-nairātmyayoḥ apratipanna-vipratipannānām 

To read such data from a file:

with open('myfile.txt', 'r', encoding='utf-8-sig') as f:
    text = f.read()

Note that even after decoding from UTF-8-SIG, you may still be unable to print your data because your console's default code page may not be able to encode other non-ascii characters in the data. In that case you will need to adjust your console settings to support UTF-8.




回答2:


try this code:

if pred_str.startswith('\ufeff'):
    pred_str = pred_str.split('\ufeff')[1]


来源:https://stackoverflow.com/questions/54664815/unicodeencodeerror-charmap-codec-cant-encode-character-ufeff-in-position

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!