PhantomJS querySelectorAll().textcontent returns nothing

瘦欲@ 提交于 2019-12-13 04:07:10

问题


I create a simple web scraper to grab data from a website by using phantomjs. It's doesn't work for me when I used querySelectorAll to get content which I want. Here is my whole code.

 var page = require('webpage').create();

var url = 'https://www.google.com.kh/?gws_rd=cr,ssl&ei=iE7jV87UKsrF0gSDw4zAAg';

page.open(url, function(status){

  if(status === 'success'){

    var title = page.evaluate(function(){
      return document.querySelectorAll('.logo-subtext')[0].textContent;
    });

    console.log(title);
  }
  phantom.exit();
});

Please help me to solve this out.

Really thanks.


回答1:


By default the virtual screen size of PhantomJS is 400x300.

var page = require('webpage').create();
console.log(page.viewportSize.width);
console.log(page.viewportSize.height);

400
300

There are sites that take note of that and instead of the normal version that you see in your desktop browser they present a mobile, stripped version of the HTML and CSS. But we can fix that by setting the desired viewport size:

page.viewportSize = { width: 1280, height: 800 };

There are also sites that do useragent sniffing and make decisions based on that. If they don't know your browser, they can show a mobile version to be on the safe side, or if they don't want to be scraped they could deny connection to PhantomJS, because it honestly declares itself:

console.log(page.settings.userAgent);

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1

But we can set the desired user agent:

 page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0';

When working with such fragile things and web scraping you really really should take notice of any errors ans system messages you can get.

So no PhantomJS script should be without onError and onConsoleMessage callbacks:

page.onError = function (msg, trace) {
    var msgStack = ['ERROR: ' + msg];
    if (trace && trace.length) {
      msgStack.push('TRACE:');
      trace.forEach(function(t) {
        msgStack.push(' -> ' + t.file + ': ' + t.line + (t.function ? ' (in function "' + t.function +'")' : ''));
      });
    }
    console.log(msgStack.join('\n'));
};   

page.onConsoleMessage = function (msg) {
    console.log(msg);
};   

Another vital technique of PhantomJS scripts debugging is making screenshots. Are you sure that PhantomJS sees what you see in you Chrome?

 page.render("google.com.png");

Before setting user agent:

After setting Firefox user agent



来源:https://stackoverflow.com/questions/39632049/phantomjs-queryselectorall-textcontent-returns-nothing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!