问题
I'm writing a page scraper for a dynamic web page. The page has an initial load and then loads the remainder of the content after a short load time.
I've accounted for the load and have successfully scraped the HTML from the page, but the page doesn't load ALL the content at once. Instead it loads a specified amount of content via GET request URL and then has a "Get more" button on the page. My objective is to click this "Get More" button until all the content is loaded on the page. For those wondering, I don't wish to load all the content at once via GET URL because of impact to their server.
I'm stuck forming the loop or iteration that would allow me to repeatedly click on the page.
const NIGHTMARE = require("nightmare");
const BETHESDA = NIGHTMARE({ show: true });
BETHESDA
// Open the bethesda web page. Web page will contain 20 mods to start.
.goto("https://bethesda.net/en/mods/skyrim?number_results=40&order=desc&page=1&platform=XB1&product=skyrim&sort=published&text=")
// Bethesda website serves all requested mods at once. Each mod has the class "tile". Wait for any tile class to appear, then proceed.
.wait(".tile");
let additionalModsPresent = true;
while(additionalModsPresent) {
setTimeout(function() {
BETHESDA
.wait('div[data-is="main-mods-pager"] > button')
.click('div[data-is="main-mods-pager"] > button')
}, 10000)
additionalModsPresent = false;
}
// let moreModsBtn = document.querySelector('div[data-is="main-mods-pager"] > button');
// .end()
BETHESDA.catch(function (error) {
console.error('Search failed:', error);
});
My thinking thus far has been to use a while loop that attempts to click the button after some interval of time. If an error occurs, it's likely because the button doesn't exist. The issue I'm having is that I can't seem to get the click to work inside of a setTimeout or setInterval. I believe there is some sort of scoping issue but I don't know what exactly is going on.
If I can get the click method to work in setInterval or something similar, the issue would be solved.
Thoughts?
回答1:
You can refer to the issue (Problem running nightmare in loops)[https://github.com/segmentio/nightmare/issues/522]
I modified your code with given guidelines. It seem to work fine
const NIGHTMARE = require("nightmare");
const BETHESDA = NIGHTMARE({
show: true
});
BETHESDA
// Open the bethesda web page. Web page will contain 20 mods to start.
.goto("https://bethesda.net/en/mods/skyrim?number_results=40&order=desc&page=1&platform=XB1&product=skyrim&sort=published&text=")
// Bethesda website serves all requested mods at once. Each mod has the class "tile". Wait for any tile class to appear, then proceed.
.wait(".tile");
next();
function next() {
BETHESDA.wait('div[data-is="main-mods-pager"] > button')
.click('div[data-is="main-mods-pager"] > button')
.then(function() {
console.log("click done");
next();
})
.catch(function(err) {
console.log(err);
console.log("All done.");
});
}
Ultimately, it should timeout on wait() for button and then you can handle the error in catch() block. Beware it goes on and on :) I did not wait till the end (you might run out of memory).
来源:https://stackoverflow.com/questions/44605473/repeatedly-clicking-an-element-on-a-page-using-electron-nightarejs