I'm pretty new to CasperJS, but isn't there a way to open a URL and execute CasperJS commands in for loops? For example, this code doesn't work as I expected it to:
casper.then(function() {
var counter = 2013;
for (i = counter; i < 2014; i++) {
var file_name = "./Draws/wimbledon_draw_" + counter + ".json";
// getting some local json files
var json = require(file_name);
var first_round = json["1"];
for (var key in first_round) {
var name = first_round[key].player_1.replace(/\s+/g, '-');
var normal_url = "http://www.atpworldtour.com/Tennis/Players/" + name;
// the casper command below only executes AFTER the for loop is done
casper.thenOpen(normal_url, function() {
this.echo(normal_url);
});
}
}
});
Instead of Casper is calling thenOpen
on each new URL per iteration, it gets only called AFTER the for loop executes. Casper thenOpen
then gets called with the last value normal_url is set to. Is there no Casper command to have it work each iteration within the for loop?
Follow up: How do we make casper thenOpen return a value on the current iteration of the for loop?
Say for example, I needed a return value on that thenOpen
(maybe if the HTTP status is 404 I need to evaluate another URL so I want to return false). Is this possible to do?
Editing casper.thenOpen
call above:
var status;
// thenOpen() only executes after the console.log statement directly below
casper.thenOpen(normal_url, function() {
status = this.status(false)['currentHTTPStatus'];
if (status == 200) {
return true;
} else {
return false;
}
});
console.log(status); // This prints UNDEFINED the same number of times as iterations.
As Fanch and Darren Cook stated, you could use an IIFE to fix the url value inside of the thenOpen
step.
An alternative would be to use getCurrentUrl
to check the url. So change the line
this.echo(normal_url);
to
this.echo(this.getCurrentUrl());
The problem is that normal_url
references the last value that was set but not the current value because it is executed later. This does not happen with casper.thenOpen(normal_url, function(){...});
, because the current reference is passed to the function. You just see the wrong url, but the correct url is actually opened.
Regarding your updated question:
All then*
and wait*
functions in the casperjs API are step functions. The function that you pass into them will be scheduled and executed later (triggered by casper.run()
). You shouldn't use variables outside of steps. Just add further steps inside of the thenOpen
call. They will be scheduled in the correct order. Also you cannot return anything from thenOpen
.
var somethingDone = false;
var status;
casper.thenOpen(normal_url, function() {
status = this.status(false)['currentHTTPStatus'];
if (status != 200) {
this.thenOpen(alternativeURL, function(){
// do something
somethingDone = true;
});
}
});
casper.then(function(){
console.log("status: " + status);
if (somethingDone) {
// something has been done
somethingDone = false;
}
});
In this example this.thenOpen
will be scheduled after casper.thenOpen
and somethingDone
will be true
inside casper.then
because it comes after it.
There are some things that you need to fix:
- You don't use your counter
i
: you probably mean"./Draws/wimbledon_draw_" + i + ".json"
not"./Draws/wimbledon_draw_" + counter + ".json"
You cannotInterestingly, you can require a JSON file. I still would userequire
a JSON string.fs.read
to read the file and parse the JSON inside it (JSON.parse
).
Regarding your question...
You didn't schedule any commands. Just add steps (then*
or wait*
) behind or inside of thenOpen
.
If you need to get context then use the example here: https://groups.google.com/forum/#!topic/casperjs/n_zXlxiPMtk
I used the IIFE (immediately-invoked-function-expression) option.
Eg:
for(var i in links) {
var link = links[i];
(function(index) {
var link = links[index]
var filename = link.replace(/#/, '');
filename = filename.replace(/\//g, '-') + '.png';
casper.echo('Attempting to capture: '+link);
casper.thenOpen(vars.domain + link).waitForSelector('.title h1', function () {
this.capture(filename);
});
})(i);
}
links
could be an array of objects and therefore your index is a reference to a group of properties if need be...
var links = [{'page':'some-page.html', 'filename':'page-page.png'}, {...}]
来源:https://stackoverflow.com/questions/24360993/casperjs-iterating-through-urls