Use PhantomJS to extract html and text

前端 未结 4 734
眼角桃花
眼角桃花 2020-12-21 23:11

I try to extract all the text content of a page (because it doesn\'t work with Simpledomparser)

I try to modify this simple example from the manual

         


        
4条回答
  •  别那么骄傲
    2020-12-21 23:32

    This version of your script should return the entire contents of the page:

    var page = require('webpage').create();
    page.settings.userAgent = 'SpecialAgent';
    page.open('http://www.httpuseragent.org', function (status) {
        if (status !== 'success') {
            console.log('Unable to access network');
        } else {
            var ua = page.evaluate(function () {
                return document.getElementsByTagName('html')[0].outerHTML;
            });
            console.log(ua);
        }
        phantom.exit();
    });
    

提交回复
热议问题