UnicodeDecodeError problem with mechanize [duplicate]

I receive the following string from one website via mechanize:

'We\x92ve'

I know that \x92 stands for ’ character. I'm trying to convert that string to Unicode:

>> unicode('We\x92ve','utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 2: unexpected code byte

What am I doing wrong?

Edit: The reason I tried 'utf-8' was this:

>> response = browser.response()
>> response.info()['content-type']
'text/html; charset=utf-8'

Now I see I can't always trust content-type header.

\x92 stands for ’ alright, but it does so in the Windows-1252 encoding, not in UTF-8:

>>> print unicode('We\x92ve','1252')
We’ve

If you don't know what encoding your source data is in, you can detect it using chardet (extremely easy to use).

来源：https://stackoverflow.com/questions/2305997/unicodedecodeerror-problem-with-mechanize

标签

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!