save html output of page after execution of the page's javascript

后端 未结 7 969
情话喂你
情话喂你 2020-11-29 21:12

There is a site I am trying to scrape, that first loads an html/js modifies the form input fields using js and then POSTs. How can I get the final html output of the POSTed

7条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-29 21:34

    the output code you have is correct, but there is an issue with synchronicity. The output lines that you have are being executed before the page is done loading. You can tie into the onLoadFinished Callback to find out when that happens. See full code below.

        var page = new WebPage()
        var fs = require('fs');
    
        page.onLoadFinished = function() {
          console.log("page load finished");
          page.render('export.png');
          fs.write('1.html', page.content, 'w');
          phantom.exit();
        };
    
        page.open("http://www.google.com", function() {
          page.evaluate(function() {
          });
        });
    

    When using a site like google, it can be deceiving because it loads so quicker, that you can often execute a screengrab inline like you have it. Timing is a tricky thing in phantomjs, sometimes I test with setTimeout to see if timing is an issue.

提交回复
热议问题