Puppeteer | 易学教程

I am having toruble installing the puppeteer library

阅读更多关于 I am having toruble installing the puppeteer library

问题 I am trying to build a simple web scraper and the first thing I have to do is install the puppeteer library. So I run the commands (I am on the latest linux Mint): sudo npm init -y and sudo npm i puppeteer but I get these errors: ▌ ╢░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░╟ WARN engine puppeteer@3.0.2: wanted: {"node":">=10.18.1"} (current: {"node":"8.10.0","npm":"3.5.2"}) loadDep:ws → request ▄ ╢███████████████████████████░░░░░░░░░░░░░░░░╟ > puppeteer@3.0.2 install

Get all links with XPath in Puppeteer (pausing or not working)?

阅读更多关于 Get all links with XPath in Puppeteer (pausing or not working)?

问题 I am required to use XPaths to select all links on a page, for then my Puppeteer app to click into and perform some actions. I am finding that the method (code below) is getting stuck sometimes and my crawler will be paused. Is there a better/different way of getting all links from an XPath? Or is there something in my code that is incorrect and could be pausing my app's progress? try { links = await this.getLinksFromXPathSelector(state); } catch (e) { console.log("error getting links");

How can I make a monitoring function to wait for an html element in puppeteer

阅读更多关于 How can I make a monitoring function to wait for an html element in puppeteer

问题 How can I make a function that waits until a certain CSS selector loads in puppeteer? I want to refresh the page over and over until the '.product-form__add-to-cart' is present, then I want it to continue on with the code. 回答1: The corresponding puppeteer method is page.waitForSelector. Example: await page.waitForSelector('.product-form__add-to-cart') But if you need to reload the page to get the desired element (maybe because the site you are visiting can look different between visits) than

TypeError [ERR_INVALID_ARG_TYPE]: The “original” argument must be of type function puppeteer node js

阅读更多关于 TypeError [ERR_INVALID_ARG_TYPE]: The “original” argument must be of type function puppeteer node js

问题 I am stuck here for a while I didn't understand the problem. Kindly someone enlightens me on this topic. here's the code. const puppeteer = require('puppeteer'); (async() => { let infourl = 'https://www.imdb.com/title/tt0111161/?ref_=fn_al_tt_3'; let browser = await puppeteer.launch(); let page = await browser.newPage(); await page.goto(infourl, { waitUntil:'networkidle2' }); let data = await page.evaluate( () =>{ let stats = document.querySelector('div[class="title_wrapper"]').innerText;

How to get children of elements by Puppeteer

阅读更多关于 How to get children of elements by Puppeteer

问题 I understand that puppeteer get its own handles rather than standard DOM elements, but I don't understand why I cannot continue the same query by found elements as const els = await page.$$('div.parent'); for (let i = 0; i < els.length; i++) { const img = await els[i].$('img').getAttribute('src'); console.log(img); const link = await els[i].$('a').getAttribute('href'); console.log(link); } 回答1: Problem The element handles are necessary as an abstraction layer between the Node.js and browser

How to get children of elements by Puppeteer

阅读更多关于 How to get children of elements by Puppeteer

How to get children of elements by Puppeteer

阅读更多关于 How to get children of elements by Puppeteer

How to get children of elements by Puppeteer

阅读更多关于 How to get children of elements by Puppeteer

Puppeteer: Chromium instances remain active in the background after browser.disconnect

阅读更多关于 Puppeteer: Chromium instances remain active in the background after browser.disconnect

问题 My environment Puppeteer version: 3.1.0 Platform / OS version: Windows 10 Node.js version: 12.16.1 My problem is: I have a for...of loop to visit 3000+ urls with puppeteer. I use puppeteer.connect to wsEndpoint so I can reuse one browser instance. I disconnect after each visit and close the tab. first 100 urls page.goto 's open the urls immediately, above 100 page.goto uses 2-3 retries per url, above 300 page.goto uses 5-8 retries per url, above 500 I get TimeoutError: Navigation timeout of

Tell Puppeteer to open Chrome tab instead of window

阅读更多关于 Tell Puppeteer to open Chrome tab instead of window

问题 If I have an existing Google Chrome window open, I'd like to tell puppeteer to open a new tab instead of opening a new window. Is there a way to do that? is there some option or flag I can pass to puppeteer to accomplish this? I have: const puppeteer = require('puppeteer'); (async function () { const b = await puppeteer.launch({ devtools: true, openInExistingWindow: true /// ? something like this? }); const page = await b.newPage(); await page.goto('https://example.com'); })(); 回答1: const