CasperJS: Iterating through URL's

孤街浪徒 提交于 2019-11-28 01:27:24

问题


I'm pretty new to CasperJS, but isn't there a way to open a URL and execute CasperJS commands in for loops? For example, this code doesn't work as I expected it to:

casper.then(function() {
    var counter = 2013;
    for (i = counter; i < 2014; i++) {
        var file_name = "./Draws/wimbledon_draw_" + counter + ".json";
        // getting some local json files
        var json = require(file_name);
        var first_round = json["1"];
        for (var key in first_round) {
            var name = first_round[key].player_1.replace(/\s+/g, '-');
            var normal_url = "http://www.atpworldtour.com/Tennis/Players/" + name;
            // the casper command below only executes AFTER the for loop is done
            casper.thenOpen(normal_url, function() {
                this.echo(normal_url);
            });
        }
    }
});

Instead of Casper is calling thenOpen on each new URL per iteration, it gets only called AFTER the for loop executes. Casper thenOpen then gets called with the last value normal_url is set to. Is there no Casper command to have it work each iteration within the for loop?

Follow up: How do we make casper thenOpen return a value on the current iteration of the for loop?

Say for example, I needed a return value on that thenOpen (maybe if the HTTP status is 404 I need to evaluate another URL so I want to return false). Is this possible to do?

Editing casper.thenOpen call above:

    var status;
    // thenOpen() only executes after the console.log statement directly below
    casper.thenOpen(normal_url, function() {
        status = this.status(false)['currentHTTPStatus'];
        if (status == 200) {
            return true;
        } else {
            return false;
        }
    });
    console.log(status); // This prints UNDEFINED the same number of times as iterations.

回答1:


As Fanch and Darren Cook stated, you could use an IIFE to fix the url value inside of the thenOpen step.

An alternative would be to use getCurrentUrl to check the url. So change the line

this.echo(normal_url);

to

this.echo(this.getCurrentUrl());

The problem is that normal_url references the last value that was set but not the current value because it is executed later. This does not happen with casper.thenOpen(normal_url, function(){...});, because the current reference is passed to the function. You just see the wrong url, but the correct url is actually opened.


Regarding your updated question:

All then* and wait* functions in the casperjs API are step functions. The function that you pass into them will be scheduled and executed later (triggered by casper.run()). You shouldn't use variables outside of steps. Just add further steps inside of the thenOpen call. They will be scheduled in the correct order. Also you cannot return anything from thenOpen.

var somethingDone = false;
var status;
casper.thenOpen(normal_url, function() {
    status = this.status(false)['currentHTTPStatus'];
    if (status != 200) {
        this.thenOpen(alternativeURL, function(){
            // do something
            somethingDone = true;
        });
    }
});
casper.then(function(){
    console.log("status: " + status);
    if (somethingDone) {
        // something has been done
        somethingDone = false;
    }
});

In this example this.thenOpen will be scheduled after casper.thenOpen and somethingDone will be true inside casper.then because it comes after it.


There are some things that you need to fix:

  • You don't use your counter i: you probably mean "./Draws/wimbledon_draw_" + i + ".json" not "./Draws/wimbledon_draw_" + counter + ".json"
  • You cannot require a JSON string. Interestingly, you can require a JSON file. I still would use fs.read to read the file and parse the JSON inside it (JSON.parse).

Regarding your question...

You didn't schedule any commands. Just add steps (then* or wait*) behind or inside of thenOpen.




回答2:


If you need to get context then use the example here: https://groups.google.com/forum/#!topic/casperjs/n_zXlxiPMtk

I used the IIFE (immediately-invoked-function-expression) option.

Eg:

for(var i in links) {
  var link = links[i];

  (function(index) {
    var link = links[index]
    var filename = link.replace(/#/, '');
    filename = filename.replace(/\//g, '-') + '.png';

    casper.echo('Attempting to capture: '+link);
    casper.thenOpen(vars.domain + link).waitForSelector('.title h1', function () {
      this.capture(filename);
    });
  })(i);
}

links could be an array of objects and therefore your index is a reference to a group of properties if need be...

var links = [{'page':'some-page.html', 'filename':'page-page.png'}, {...}]


来源:https://stackoverflow.com/questions/24360993/casperjs-iterating-through-urls

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!