Casperjs scraping dynamic content

☆樱花仙子☆ 提交于 2020-02-12 05:07:50

问题


I'm trying to scrape this page using Casperjs. The main function to my code works just fine, but the content is loaded dynamically and I can't figure out how to trigger that.

This is what I'm doing right now:

casper.waitFor(function() {

    this.scrollToBottom();

    var count = this.evaluate(function() {
        var match = document.querySelectorAll('.loading-msg');
        return match.length;
    });

    if (count <= 1) {
        return true;
    }
    else {
        return false
    };

}, function() { // do stuff });

The wait timeout just expires, even though I've increased it to 20s, and the new content never gets loaded. I've tried adapting this function to my case:

function tryAndScroll(casper) {
  casper.waitFor(function() {
    this.page.scrollPosition = { top: this.page.scrollPosition["top"] + 4000, left: 0 };
    return true;
  }, function() {
    var info = this.getElementInfo('p[loading-spinner="!loading"]');
    if (info["visible"] == true) {
      this.waitWhileVisible('p[loading-spinner="!loading"]', function () {
        this.emit('results.loaded');
      }, function () {
        this.echo('next results not loaded');
      }, 5000);
    }
  }, function() {
    this.echo("Scrolling failed. Sorry.").exit();
  }, 500);
}

But I couldn't figure it out and I'm not even sure it's relevant here. Any ideas?


回答1:


I've looked to the page. It has such a behvior that it doesn't load the middle images when you jump to the end.

When the page is loaded the first couple of rows are completely loaded and some more are not completely loaded (image missing denoted by '.loading-msg' element). When you jump to the end with this.scrollToBottom(); there is no continous scroll. It jumps to the end and the page JavaScript doesn't detect that the middle images were in the viewport, however briefly. The page goes on to load the next rows, but not the missing images of the jumped over rows.

You have to reduce the distance of the jump in both of your snippets.

The first one can be changed like this:

var pos = 0, 
    height = casper.page.viewportSize.height;
casper.waitFor(function() {
    this.scrollTo(0, pos * height);
    return !this.exists('.loading-msg');
}, function() { // do stuff }, 20000);

The second one might work by changing

this.page.scrollPosition = { top: this.page.scrollPosition["top"] + 4000, left: 0 };

to

var height = casper.page.viewportSize.height;
this.page.scrollPosition = { top: this.page.scrollPosition.top + height, left: 0 };


来源:https://stackoverflow.com/questions/28926594/casperjs-scraping-dynamic-content

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!