When screen-scraping a webpage using python one has to know the character encoding of the page. If you get the character encoding wrong th
When you download a file with urllib or urllib2, you can find out whether a charset header was transmitted:
fp = urllib2.urlopen(request)
charset = fp.headers.getparam('charset')
You can use BeautifulSoup to locate a meta element in the HTML:
soup = BeatifulSoup.BeautifulSoup(data)
meta = soup.findAll('meta', {'http-equiv':lambda v:v.lower()=='content-type'})
If neither is available, browsers typically fall back to user configuration, combined with auto-detection. As rajax proposes, you could use the chardet module. If you have user configuration available telling you that the page should be Chinese (say), you may be able to do better.