I\'m programming in Python and I\'m obtaining information from a web page through the urllib2 library. The problem is that that page can provide me with non-ASCII characters
You might want to look into using an actual parsing library to find this information. lxml, for instance, already addresses Unicode encode/decode using the declared character set.
You want to use unicode for all your work if you can.
You probably will find this question/answer useful:
urllib2 read to Unicode
You just read a set of bytes from the socket. If you want a string you have to decode it:
yourstring = receivedbytes.decode("utf-8")
(substituting whatever encoding you're using for utf-8
)
Then you have to do the reverse to send it back out:
outbytes = yourstring.encode("utf-8")