How to download any(!) webpage with correct charset in python?

前端未结

关注

 7  1901

醉酒成梦 2020-11-30 20:16

Problem

When screen-scraping a webpage using python one has to know the character encoding of the page. If you get the character encoding wrong th

7条回答

抹茶落季 (楼主)

2020-11-30 20:50

instead of trying to get a page then figuring out the charset the browser would use, why not just use a browser to fetch the page and check what charset it uses..

from win32com.client import DispatchWithEvents
import threading


stopEvent=threading.Event()

class EventHandler(object):
    def OnDownloadBegin(self):
        pass

def waitUntilReady(ie):
    """
    copypasted from
    http://mail.python.org/pipermail/python-win32/2004-June/002040.html
    """
    if ie.ReadyState!=4:
        while 1:
            print "waiting"
            pythoncom.PumpWaitingMessages()
            stopEvent.wait(.2)
            if stopEvent.isSet() or ie.ReadyState==4:
                stopEvent.clear()
                break;

ie = DispatchWithEvents("InternetExplorer.Application", EventHandler)
ie.Visible = 0
ie.Navigate('http://kskky.info')
waitUntilReady(ie)
d = ie.Document
print d.CharSet

0 讨论(0)

查看其它7个回答