How to handle Unicode (non-ASCII) characters in Python?

后端 未结 3 1270
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-11 15:30

I\'m programming in Python and I\'m obtaining information from a web page through the urllib2 library. The problem is that that page can provide me with non-ASCII characters

相关标签:
3条回答
  • 2020-12-11 15:54

    You might want to look into using an actual parsing library to find this information. lxml, for instance, already addresses Unicode encode/decode using the declared character set.

    0 讨论(0)
  • 2020-12-11 15:55

    You want to use unicode for all your work if you can.

    You probably will find this question/answer useful:

    urllib2 read to Unicode

    0 讨论(0)
  • 2020-12-11 16:05

    You just read a set of bytes from the socket. If you want a string you have to decode it:

    yourstring = receivedbytes.decode("utf-8") 
    

    (substituting whatever encoding you're using for utf-8)

    Then you have to do the reverse to send it back out:

    outbytes = yourstring.encode("utf-8")
    
    0 讨论(0)
提交回复
热议问题