How can I encode and decode percent-encoded (URL encoded) strings in Python?

我怕爱的太早我们不能终老 提交于 2020-01-14 18:43:18

问题


I wrote a simple application which downloads articles from wiki pages. When I search, for example for a firstname Lech, my code returns strings like Lech_Kaczy%C5%84ski or Lech_Pozna%C5%84 instead of Lech_Kaczyński and Lech_Poznań.

How can I decode those characters to ordinary polish letters? I tried to use: urllib.unquote(text) but then got Lech_Kaczy\xc5\x84ski, Lech_Pozna\xc5\x84 instead of Lech_Kaczyński and Lech_Poznań.

I have in my code:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

But the result is the same (it simply does not work).


回答1:


Try this:

import urllib
urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8')

This will return a unicode string:

u'Lech_Kaczy\u0144ski'

which you can then print and process as usual. For example:

print(urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8'))

will result in

Lech_Kaczyński



回答2:


This worked for me:

import urllib

print urllib.unquote('Lech_Kaczy%C5%84ski')

Prints out

Lech_Kaczyński



回答3:


For Python 3, unquote is now within urllib.parse:

import urllib

print(urllib.parse.unquote("Lech_Kaczy%C5%84ski"))


来源:https://stackoverflow.com/questions/33143504/how-can-i-encode-and-decode-percent-encoded-url-encoded-strings-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!