URL component % and \x

孤者浪人 提交于 2021-01-28 11:52:08

问题


I have a doubt.

st = "b%C3%BCrokommunikation"
urllib2.unquote(st)

OUTPUT: 'b\xc3\xbcrokommunikation' But, if I print it:

print urllib2.unquote(st)

OUTPUT: bürokommunikation

Why is the difference? I have to write bürokommunikation instead of 'b\xc3\xbcrokommunikation' into a file.

My problem is: I have lots of data with such values extracted from URLs. I have to store them as eg. bürokommunikation into a text file.


回答1:


When you print the string, your terminal emulator recognizes the unicode character \xc3\xbc and displays it correctly.

However, as @MarkDickinson says in the comments, ü doesn't exist in ASCII, so you'll need to tell Python that the string you want to write to a file is unicode encoded, and what encoding format you want to use, for instance UTF-8.

This is very easy using the codecs library:

import codecs

# First create a Python UTF-8 string
st = "b%C3%BCrokommunikation"
encoded_string = urllib2.unquote(st).decode('utf-8')

# Write it to file keeping the encoding
with codecs.open('my_file.txt', 'w', 'utf-8') as f:
    f.write(encoded_string)



回答2:


You are looking at the same result. when you try to print it without print command, it just show the __repr__() result. when you use print, it shows the unicode character instead of escaping it with \x



来源:https://stackoverflow.com/questions/34379432/url-component-and-x

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!