How to download any(!) webpage with correct charset in python?
Problem When screen-scraping a webpage using python one has to know the character encoding of the page. If you get the character encoding wrong than your output will be messed up. People usually use some rudimentary technique to detect the encoding. They either use the charset from the header or the charset defined in the meta tag or they use an encoding detector (which does not care about meta tags or headers). By using only one these techniques, sometimes you will not get the same result as you would in a browser. Browsers do it this way: Meta tags always takes precedence (or xml definition)