How to handle Unicode (non-ASCII) characters in Python?

后端未结

关注

 3  1270

I\'m programming in Python and I\'m obtaining information from a web page through the urllib2 library. The problem is that that page can provide me with non-ASCII characters

相关标签:

3条回答

栀梦

2020-12-11 15:54

You might want to look into using an actual parsing library to find this information. lxml, for instance, already addresses Unicode encode/decode using the declared character set.

0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2020-12-11 15:55

You want to use unicode for all your work if you can.

You probably will find this question/answer useful:

urllib2 read to Unicode

0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-12-11 16:05
You just read a set of bytes from the socket. If you want a string you have to decode it:
```
yourstring = receivedbytes.decode("utf-8") 
```
(substituting whatever encoding you're using for utf-8)

Then you have to do the reverse to send it back out:
```
outbytes = yourstring.encode("utf-8")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...