URLDecoding requests

随声附和 提交于 2019-12-01 14:15:52
UnicodeEncodeError: 'ascii' codec can't encode characters

You are trying to decode a string that is Unicode already. It raises AttributeError on Python 3 (unicode string has no .decode() method there). Python 2 tries to encode the string into bytes first using sys.getdefaultencoding() ('ascii') before passing it to .decode('utf8') which leads to UnicodeEncodeError.

In short, do not call .decode() on Unicode strings, use this instead:

print urllib.unquote(res.url.encode('ascii')).decode('utf-8')

Without .decode() call, the code prints bytes (assuming a bytestring is passed to unquote()) that may lead to mojibake if the character encoding used by your environment is not utf-8. To avoid mojibake, always print Unicode (don't print text as bytes), do not hardcode the character encoding of your environment inside your script i.e., .decode() is necessary here.


There is a bug in urllib.unquote() if you pass it a Unicode string:

>>> print urllib.unquote(u'​%C3%A4')
ä
>>> print urllib.unquote('​%C3%A4') # utf-8 output
ä

Pass bytestrings to unquote() on Python 2.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!