How to “render” HTML with PyQt5's QWebEngineView

前端 未结 3 1907
野性不改
野性不改 2020-12-14 21:13

How can I \"render\" HTML with with PyQt5 v5.6 QWebEngineView?

I have previously performed the task with PyQt5 v5.4.1 QWebPage, but it was suggested to try the newe

相关标签:
3条回答
  • 2020-12-14 21:42

    Quite a bit of discussion on the topic was made in the following thread: https://riverbankcomputing.com/pipermail/pyqt/2015-January/035324.html

    The new QWebEngine interface takes account of the fact that the underlying Chromium engine is asynchronous. As such we have to turn an asynchronous API into a synchronous one.

    Here's how that looks:

    def render(source_html):
        """Fully render HTML, JavaScript and all."""
    
        import sys
        from PyQt5.QtCore import QEventLoop
        from PyQt5.QtWidgets import QApplication
        from PyQt5.QtWebEngineWidgets import QWebEngineView
    
        class Render(QWebEngineView):
            def __init__(self, html):
                self.html = None
                self.app = QApplication(sys.argv)
                QWebEngineView.__init__(self)
                self.loadFinished.connect(self._loadFinished)
                self.setHtml(html)
                while self.html is None:
                    self.app.processEvents(QEventLoop.ExcludeUserInputEvents | QEventLoop.ExcludeSocketNotifiers | QEventLoop.WaitForMoreEvents)
                self.app.quit()
    
            def _callable(self, data):
                self.html = data
    
            def _loadFinished(self, result):
                self.page().toHtml(self._callable)
    
        return Render(source_html).html
    
    import requests
    sample_html = requests.get(dummy_url).text
    print(render(sample_html))
    
    0 讨论(0)
  • 2020-12-14 21:54

    The answer by Six & Veehmot is great, but I found out that for my purpose it was not sufficient, as it did not expand the dropdown elements of the page that I wanted to scrape. A slight modification fixed this:

    def render(url):
        """Fully render HTML, JavaScript and all."""
    
        import sys
        from PyQt5.QtCore import QEventLoop,QUrl
        from PyQt5.QtWidgets import QApplication
        from PyQt5.QtWebEngineWidgets import QWebEngineView
    
        class Render(QWebEngineView):
            def __init__(self, url):
                self.html = None
                self.app = QApplication(sys.argv)
                QWebEngineView.__init__(self)
                self.loadFinished.connect(self._loadFinished)
                self.load(QUrl(url))
                while self.html is None:
                    self.app.processEvents(QEventLoop.ExcludeUserInputEvents | QEventLoop.ExcludeSocketNotifiers | QEventLoop.WaitForMoreEvents)
                self.app.quit()
    
            def _callable(self, data):
                self.html = data
    
            def _loadFinished(self, result):
                self.page().toHtml(self._callable)
    
        return Render(url).html
    
    
    print(render(dummy_url))
    
    0 讨论(0)
  • 2020-12-14 22:07

    As you pointed out, Qt5.4 relies on async calls. It's not necessary to use the Loop (as seen on your answer), since your only mistake was to call quit before the toHtml call finishes.

    def render(source_html):
        """Fully render HTML, JavaScript and all."""
    
        import sys
        from PyQt5.QtWidgets import QApplication
        from PyQt5.QtWebEngineWidgets import QWebEngineView
    
        class Render(QWebEngineView):
            def __init__(self, html):
                self.html = None
                self.app = QApplication(sys.argv)
                QWebEngineView.__init__(self)
                self.loadFinished.connect(self._loadFinished)
                self.setHtml(html)
                self.app.exec_()
    
            def _loadFinished(self, result):
                # This is an async call, you need to wait for this
                # to be called before closing the app
                self.page().toHtml(self.callable)
    
            def callable(self, data):
                self.html = data
                # Data has been stored, it's safe to quit the app
                self.app.quit()
    
        return Render(source_html).html
    
    import requests
    sample_html = requests.get(dummy_url).text
    print(render(sample_html))
    
    0 讨论(0)
提交回复
热议问题