How can I encode and decode percent-encoded (URL encoded) strings in Python?

问题

I wrote a simple application which downloads articles from wiki pages. When I search, for example for a firstname Lech, my code returns strings like Lech_Kaczy%C5%84ski or Lech_Pozna%C5%84 instead of Lech_Kaczyński and Lech_Poznań.

How can I decode those characters to ordinary polish letters? I tried to use: urllib.unquote(text) but then got Lech_Kaczy\xc5\x84ski, Lech_Pozna\xc5\x84 instead of Lech_Kaczyński and Lech_Poznań.

I have in my code:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

But the result is the same (it simply does not work).

回答1:

Try this:

import urllib
urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8')

This will return a unicode string:

u'Lech_Kaczy\u0144ski'

which you can then print and process as usual. For example:

print(urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8'))

will result in

Lech_Kaczyński

回答2:

This worked for me:

import urllib

print urllib.unquote('Lech_Kaczy%C5%84ski')

Prints out

Lech_Kaczyński

回答3:

For Python 3, unquote is now within urllib.parse:

import urllib

print(urllib.parse.unquote("Lech_Kaczy%C5%84ski"))

来源：https://stackoverflow.com/questions/33143504/how-can-i-encode-and-decode-percent-encoded-url-encoded-strings-in-python

标签

python

encoding

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!