Python decoding from iso-8859-5

荒凉一梦 提交于 2021-02-08 08:34:13

问题


When I parse my email messages via python email.parser.Parser, I had a lot of strings like this:

=?ISO-8859-5?Q?=C0=D5=D5=E1=E2=E0_=BF=DB=D0=E2=D5=D6=D5=D9_?=

How can i decode this to utf-8 using python?


回答1:


Your input is quoted-printable encoded text. You can use the module quopri to handle that:

import quopri

incode = '=?ISO-8859-5?Q?=C0=D5=D5=E1=E2=E0_=BF=DB=D0=E2=D5=D6=D5=D9_?='
inencoding = incode[2:12]  # 'ISO-8859-5'
intext = incode[15:-2]
result = quopri.decodestring(intext).encode(inencoding)

Result will then be

Реестр_Платежей 

Around the quoted-printable encoding you additionally have an email-header formating, specifying the character encoding the string should be interpreted in after applying the quoted-printable decoding. The example code above substrings the portions "manually", but you also can solve all that in one step:

import email

[ (text, encoding) ] = email.header.decode_header(incode)
result = text.decode(encoding)

result now will again be the string given above.



来源:https://stackoverflow.com/questions/24080233/python-decoding-from-iso-8859-5

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!