问题
I wrote a simple application which downloads articles from wiki pages. When I search, for example for a firstname Lech
, my code returns strings like Lech_Kaczy%C5%84ski
or Lech_Pozna%C5%84
instead of Lech_Kaczyński
and Lech_Poznań
.
How can I decode those characters to ordinary polish letters? I tried to use:
urllib.unquote(text)
but then got Lech_Kaczy\xc5\x84ski
, Lech_Pozna\xc5\x84
instead of Lech_Kaczyński
and Lech_Poznań
.
I have in my code:
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
But the result is the same (it simply does not work).
回答1:
Try this:
import urllib
urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8')
This will return a unicode string:
u'Lech_Kaczy\u0144ski'
which you can then print and process as usual. For example:
print(urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8'))
will result in
Lech_Kaczyński
回答2:
This worked for me:
import urllib
print urllib.unquote('Lech_Kaczy%C5%84ski')
Prints out
Lech_Kaczyński
回答3:
For Python 3, unquote
is now within urllib.parse
:
import urllib
print(urllib.parse.unquote("Lech_Kaczy%C5%84ski"))
来源:https://stackoverflow.com/questions/33143504/how-can-i-encode-and-decode-percent-encoded-url-encoded-strings-in-python