urllib2 opener providing wrong charset

前端 未结 2 699
既然无缘
既然无缘 2020-12-11 07:21

When I open the url and read it, I can\'t recognize it. But when I check the content header it says it is encoded as utf-8. So I tried to convert it to unicode and it compla

2条回答
  •  -上瘾入骨i
    2020-12-11 07:47

    The header is probably wrong. Check out chardet.

    EDIT: Thinking more about it -- my money is on the contents being gzipped. I believe some of Python's various URL-opening modules/classes/etc will ungzip, while others won't.

提交回复
热议问题