python urllib2 utf-8 encoding

巧了我就是萌 提交于 2019-12-03 21:31:30
Raymond Hettinger

Try decoding the data using 'latin-1' to see what it looks like. What you're seeing indicates a UTF-8 decode error (see UnicodeDecodeError, invalid continuation byte ).

It would be helpful if you posted the result of list(f.read())[:100] so we can see the data.

FYI, putting # -*- coding: utf-8 -*- is unrelated to your issue. That encoding refers to the encoding of your python script itself, not the data it is handling :-)

Rob Cowie

That particular error is commonly caused by trying to decode using utf-8 when the string was actually encoded with latin1. See UnicodeDecodeError, invalid continuation byte for some more info.

I suspect that despite the header, the server is not returning utf8 encoded content.

A solution that might be worth pursuing is to use chardet to 'guess' which encoding is used. Despite chardet's awesomeness consider it a last resort however.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!