HTML page vastly different when using a headless webkit implementation using PyQT

自古美人都是妖i 提交于 2019-12-22 07:38:27

问题


I was under the impression that using a headless browser implementation of webkit using PyQT will automatically get me the html code for each URL even with heavy JS code in it. But I am only seeing it partially. I am comparing with the page I get when I save the page from the firefox window.

I am using the following code -

class JabbaWebkit(QWebPage):
    # 'html' is a class variable

    def __init__(self, url, wait, app, parent=None):
        super(JabbaWebkit, self).__init__(parent)
        JabbaWebkit.html = ''

        if wait:
            QTimer.singleShot(wait * SEC, app.quit)
        else:
            self.loadFinished.connect(app.quit)

        self.mainFrame().load(QUrl(url))

    def save(self):
        JabbaWebkit.html = self.mainFrame().toHtml()

    def userAgentForUrl(self, url):
        return USER_AGENT


    def get_page(url, wait=None):
        # here is the trick how to call it several times
        app = QApplication.instance() # checks if QApplication already exists

        if not app: # create QApplication if it doesnt exist
            app = QApplication(sys.argv)
        #
        form = JabbaWebkit(url, wait, app)
        app.aboutToQuit.connect(form.save)
        app.exec_()
        return JabbaWebkit.html

Can some one see anything obviously wrong with the code?

After running the code through a few URLs, here is one I found that shows the problems I am running into quite clearly - http://www.chilis.com/EN/Pages/menu.aspx

Thanks for any pointers.


回答1:


The page have ajax code, when it finish load, it still need some time to update the page with ajax. But you code will quit when it finish load.

You should add some code like this to wait some time and process events in webkit:

for i in range(200): #wait 2 seconds
    app.processEvents()
    time.sleep(0.01)


来源:https://stackoverflow.com/questions/19214939/html-page-vastly-different-when-using-a-headless-webkit-implementation-using-pyq

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!