问题
I am trying to pull the source code to several webpages at once. The links are fed into the array via a source text file. I am able to iterate through the array and print out the links and confirm they are there, but when trying to pass them through a function, they become undefined after the first iteration.
My ultimate goal is to have it save the source of each page to its own document. It does the first page correctly, but subsequent attempts are undefined. I've searched for hours but would appreciate it if someone could point me in the right direction.
var fs = require('fs');
var pageContent = fs.read('input.txt');
var arrdata = pageContent.split(/[\n]/);
var system = require('system');
var page = require('webpage').create();
var args = system.args;
var imagelink;
var content = " ";
function handle_page(file, imagelink){
page.open(file,function(){
var js = page.evaluate(function (){
return document;
});
fs.write(imagelink, page.content, 'w');
setTimeout(next_page(),500);
});
}
function next_page(imagelink){
var file = imagelink;
if(!file){phantom.exit(0);}
handle_page(file, imagelink);
}
for(var i in arrdata){
next_page(arrdata[i]);
}
I realize now that having that the for loop will only iterate once, then the other two functions make their own loop, so that makes sense, but still having issues getting it running.
回答1:
PhantomJS's page.open()
is asynchronous (that's why there is a callback). The other thing is that page.open()
is a long operation. If two such calls are made the second will overwrite the first one, because you're operating on the same page
object.
The best way would be to use recursion:
function handle_page(i){
if (arrdata.length === i) {
phantom.exit();
return;
}
var imageLink = arrdata[i];
page.open(imageLink, function(){
fs.write("file_"+i+".html", page.content, 'w');
handle_page(i+1);
});
}
handle_page(0);
Couple of other things:
setTimeout(next_page(),500);
immediately invokesnext_page()
without waiting. You wantedsetTimeout(next_page, 500);
, but then it also wouldn't work, because without an argumentnext_page
simply exits.fs.write(imagelink, page.content, 'w')
thatimagelink
is probably a URL in which case, you probably want to define another way to devise a filename.- While
for(var i in arrdata){ next_page(arrdata[i]); }
works here be aware that this doesn't work on all arrays and array-like objects. Use dumb for loops likefor(var i = 0; i < length; i++)
orarray.forEach(function(item, index){...})
if it is available. page.evaluate()
is sandboxed and provides access to the DOM, but everything that is not JSON serializable cannot be passed out of it. You will have to put that into a serializable format before passing it out ofevaluate()
.
来源:https://stackoverflow.com/questions/31420803/cant-pass-array-items-to-function-in-phantomjs