Python - how to open a text file that has emojis in it

我只是一个虾纸丫 提交于 2019-12-25 14:40:58

问题


I´m trying to do the simplest thing, open a file, read and close it in python. Simple. Well this is the code:

name_file = open("Forever.txt", encoding='UTF-8')
data = name_file.read()
name_file.close()

print (data)

I know that this texts has emojis in it like hearts, etc. The thing is that this emojis are not in there unicode syntax like U+2600 , they are placed as little images. I think the following error is because of this little images:

return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f681' in         
position 2333: character maps to <undefined>

I tried the following, without specifyng encoding:

name_file = open("Forever.txt")

And the error changed to this:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2303: character maps to <undefined>

No idea why is this happening.

Maybe one solution would be to save in a variable everything that is test and deleting the rest...mmm

Any help will be really appreciated


回答1:


You are getting a UnicodeEncodeError, likely from your print statement. The file is being read and interpreted correctly, but you can only print characters that your console encoding and font actually support. The error indicates the character isn't supported in the current encoding.

For example:

Python 3.3.5 (v3.3.5:62cf4e77f785, Mar  9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\U0001F681')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\\Python33\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f681' in position 0: character maps to <undefined>

But print a character the terminal encoding supports, and it works:

>>> print('\U000000E0')
à

My console encoding was cp437, but if I use a Python IDE that supports UTF-8 encoding, then it works:

>>> print('\U0001f681')
🚁

You may or may not see the character correctly. You need to be using a font that supports the character; otherwise, you get some default replacement character.




回答2:


Without seeing your input file, it's hard to guess what encoding it's actually in. A text file containing "little images" isn't a meaningful description of the file format, though my guess is that your file actually is UTF-8 encoded, since opening it with that encoding works. Printing the data fails because the codec of your stdout (likely the codec of your terminal) isn't able to encode the emoji. You could try explicitly encoding in UTF-8, if your terminal supports that encoding:

sys.stdout.buffer.write(data.encode('utf-8'))

If your terminal doesn't support a codec that is able to display the emoji, then this is an inherent limitation of your terminal, and there is nothing you can do about it in the Python code.



来源:https://stackoverflow.com/questions/32232556/python-how-to-open-a-text-file-that-has-emojis-in-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!