>>> import urllib2
>>> good_article = \'http://en.wikipedia.org/wiki/Wikipedia\'
>>> bad_article = \'http://en.wikipedia.org/wiki/India\'
It's not an environment, locale, or encoding problem. The offending stream of bytes is gzip-compressed. The \x1f\x8B
at the start is what you get at the start of a gzip stream with the default settings.
Looks as though the server is ignoring the fact that you didn't do
req2.add_header('Accept-encoding', 'gzip')
You should look at result.headers.getheader('Content-Encoding')
and if necessary, decompress it yourself.