Python imaplib: Display non-ASCII characters correctly

点点圈 提交于 2021-01-27 18:26:16

问题


I am using Python 3.5 and imaplib to fetch an e-mail from GMail and print its body. The body contains non-ASCII characters. These are 'encoded' in a strange way and I cannot find out how to fix this.

import email
import imaplib

c = imaplib.IMAP4_SSL('imap.gmail.com')
c.login('example@gmail.com', 'password')

c.select('Inbox')
_, data = c.fetch(b'12345', '(RFC822)')

mail = data[0][1]
message = email.message_from_bytes(mail)
payload = message.get_payload()

body = mail[0].as_string()
print(body)

Gives

>> ... Mit freundlichen Gr=C3=BC=C3=9Fen ...

instead of the desired

>> ... Mit freundlichen Grüßen ...

It looks to me like this is not an issue of encoding but one of conversion. But how do I tell Python to convert the characters correctly? Is there a more convenient library?


回答1:


The text is encoded with quoted-printable encoding, which is a way to encode non-ascii characters in ascii text. You can decode it using python's quopri module.

>>> import quopri
>>> bs = b'Gr=C3=BC=C3=9Fen'

>>> # Decode quoted-printable to raw bytes.
>>> utf8 = quopri.decodestring(bs)

>>> # Decode bytes to text.
>>> s = utf8.decode('utf-8')
>>> print(s)
Grüßen

You may find that quoted-printable is the value of the email's content-transfer-encoding header.



来源:https://stackoverflow.com/questions/53871898/python-imaplib-display-non-ascii-characters-correctly

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!