how to url-safe encode a string with python? and urllib.quote is wrong

二次信任 提交于 2019-12-10 18:49:35

问题


Hello i was wondering if you know any other way to encode a string to a url-safe, because urllib.quote is doing it wrong, the output is different than expected:

If i try

urllib.quote('á')

i get

'%C3%A1'

But thats not the correct output, it should be %E1

As demostrated by the tool provided here this site

And this is not me being difficult, the incorrect output of quote is preventing the browser to found resources, if i try

urllib.quote('\images\á\some file.jpg')

And then i try with the javascript tool i mentioned i get this strings respectively

%5Cimages%5C%C3%A1%5Csome%20file.jpg

%5Cimages%5C%E1%5Csome%20file.jpg

Note how is almost the same but the url provided by quote doesn't work and the other one it does. I tried messing with encode('utf-8) on the string provided to quote but it does not make a difference. I tried with other spanish words with accents and the ñ they all are differently represented.

Is this a python bug? Do you know some module that get this right?


回答1:


According to RFC 3986, %C3%A1 is correct. Characters are supposed to be converted to an octet stream using UTF-8 before the octet stream is percent-encoded. The site you link is out of date.

See Why does the encoding's of a URL and the query string part differ? for more detail on the history of handling non-ASCII characters in URLs.




回答2:


Ok, got it, i have to encode to iso-8859-1 like this

word = u'á'
word = word.encode('iso-8859-1')
print word



回答3:


Python is interpreted in ASCII by default, so even though your file may be encoded differently, your UTF-8 char is interpereted as two ASCII chars.

Try putting a comment as the first of second line of your code like this to match the file encoding, and you might need to use u'á' also.

# coding: utf-8



回答4:


What about using unicode strings and the numeric representation (ord) of the char?

>>> print '%{0:X}'.format(ord(u'á'))
%E1



回答5:


In this question it seems some guy wrote a pretty large function to convert to ascii urls, thats what i need. But i was hoping there was some encoding tool in the std lib for the job.



来源:https://stackoverflow.com/questions/6338469/how-to-url-safe-encode-a-string-with-python-and-urllib-quote-is-wrong

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!