When I open the url and read it, I can\'t recognize it. But when I check the content header it says it is encoded as utf-8. So I tried to convert it to unicode and it compla
This is a common mistake. The server sends gzipped stream.
You should unpack it first:
response = opener.open(self.__url, data)
if response.info().get('Content-Encoding') == 'gzip':
buf = StringIO.StringIO( response.read())
gzip_f = gzip.GzipFile(fileobj=buf)
content = gzip_f.read()
else:
content = response.read()
The header is probably wrong. Check out chardet.
EDIT: Thinking more about it -- my money is on the contents being gzipped. I believe some of Python's various URL-opening modules/classes/etc will ungzip, while others won't.