Dump HTML of page including iframes

跟風遠走 提交于 2019-12-01 07:01:19

问题


I'd like to dump the HTML contents of a web page, including the HTML of iframes included inside the <iframe> elements. The Chrome Developer Tools "Elements" tab is capable of showing iframe embedded in this way.

When I say "dump the HTML contents" I'm interested in browser automation tools like Selenium or PhantomJS. Do any of these tools have this capacity built in?

For example, the HTML dump I'd like of this page should include the HTML source of this embedded page.


回答1:


You can use phantomjs to achieve this

Here is a code snippet from the phantom js server code.

var system = require('system');
var url = system.args[1] || '';
if(url.length > 0) {
  var page = require('webpage').create();  
  page.open(url, function (status) {
    if (status == 'success') {
      var delay, checker = (function() {
        var html = page.evaluate(function () {
          var body = document.getElementsByTagName('body')[0];
          if(body.getAttribute('data-status') == 'ready') {
            return document.getElementsByTagName('html')[0].outerHTML;
          }
        });
        if(html) {
          clearTimeout(delay);
          console.log(html);
          phantom.exit();
        }
      });
      delay = setInterval(checker, 100);
    }
  });
}

on the html you use the "data-status" attribute to let phantomjs know when the page is ready if the html belongs to you . The other option would be to use a nice timeout if the html page does not belong to you.



来源:https://stackoverflow.com/questions/26663357/dump-html-of-page-including-iframes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!