问题
To encode the URI, I used urllib.quote("schönefeld")
but when some non-ascii characters exists in string, it thorws
KeyError: u'\xe9'
Code: return ''.join(map(quoter, s))
My input strings are köln, brønshøj, schönefeld
etc.
When I tried just printing statements in windows(Using python2.7, pyscripter IDE). But in linux it raises exception (I guess platform doesn't matter).
This is what I am trying:
from commands import getstatusoutput
queryParams = "schönefeld";
cmdString = "http://baseurl" + quote(queryParams)
print getstatusoutput(cmdString)
Exploring the issue reason:
in urllib.quote()
, actually exception being throwin at return ''.join(map(quoter, s))
.
The code in urllib is:
def quote(s, safe='/'):
if not s:
if s is None:
raise TypeError('None object cannot be quoted')
return s
cachekey = (safe, always_safe)
try:
(quoter, safe) = _safe_quoters[cachekey]
except KeyError:
safe_map = _safe_map.copy()
safe_map.update([(c, c) for c in safe])
quoter = safe_map.__getitem__
safe = always_safe + safe
_safe_quoters[cachekey] = (quoter, safe)
if not s.rstrip(safe):
return s
return ''.join(map(quoter, s))
The reason for exception is in ''.join(map(quoter, s))
, for every element in s, quoter function will be called and finally the list will be joined by '' and returned.
For non-ascii char è
, the equivalent key will be %E8
which presents in _safe_map
variable. But when I am calling quote('è'), it searches for the key \xe8
. So that the key does not exist and exception thrown.
So, I just modifed s = [el.upper().replace("\\X","%") for el in s]
before calling ''.join(map(quoter, s))
within try-except block. Now it works fine.
But I am annoying what I have done is correct approach or it will create any other issue? And also I do have 200+ instances of linux which is very tough to deploy this fix in all instances.
回答1:
You are trying to quote Unicode data, so you need to decide how to turn that into URL-safe bytes.
Encode the string to bytes first. UTF-8 is often used:
>>> import urllib
>>> urllib.quote(u'sch\xe9nefeld')
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py:1268: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
return ''.join(map(quoter, s))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1268, in quote
return ''.join(map(quoter, s))
KeyError: u'\xe9'
>>> urllib.quote(u'sch\xe9nefeld'.encode('utf8'))
'sch%C3%A9nefeld'
However, the encoding depends on what the server will accept. It's best to stick to the encoding the original form was sent with.
回答2:
By just converting the string to unicode I resolved the issue.
here is the snippet:
try:
unicode(mystring, "ascii")
except UnicodeError:
mystring = unicode(mystring, "utf-8")
else:
pass
Detailed description of solution can be found at http://effbot.org/pyfaq/what-does-unicodeerror-ascii-decoding-encoding-error-ordinal-not-in-range-128-mean.htm
回答3:
I had the exact same error as @underscore but in my case the problem was that map(quoter,s) tried to look for the key u'\xe9'
which was not in the _safe_map
. However \xe9
was, so I solved the issue by replacing u'\xe9'
by \xe9
in s
.
Moreover, shouldn't the return
statement be within the try/except
? I also had to change this to completely solve the problem.
来源:https://stackoverflow.com/questions/15115588/urllib-quote-throws-keyerror