Puppeteer

I am having toruble installing the puppeteer library

柔情痞子 提交于 2020-07-23 06:31:08
问题 I am trying to build a simple web scraper and the first thing I have to do is install the puppeteer library. So I run the commands (I am on the latest linux Mint): sudo npm init -y and sudo npm i puppeteer but I get these errors: ▌ ╢░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░╟ WARN engine puppeteer@3.0.2: wanted: {"node":">=10.18.1"} (current: {"node":"8.10.0","npm":"3.5.2"}) loadDep:ws → request ▄ ╢███████████████████████████░░░░░░░░░░░░░░░░╟ > puppeteer@3.0.2 install

Get all links with XPath in Puppeteer (pausing or not working)?

你。 提交于 2020-07-18 07:29:10
问题 I am required to use XPaths to select all links on a page, for then my Puppeteer app to click into and perform some actions. I am finding that the method (code below) is getting stuck sometimes and my crawler will be paused. Is there a better/different way of getting all links from an XPath? Or is there something in my code that is incorrect and could be pausing my app's progress? try { links = await this.getLinksFromXPathSelector(state); } catch (e) { console.log("error getting links");

How can I make a monitoring function to wait for an html element in puppeteer

可紊 提交于 2020-07-10 07:01:11
问题 How can I make a function that waits until a certain CSS selector loads in puppeteer? I want to refresh the page over and over until the '.product-form__add-to-cart' is present, then I want it to continue on with the code. 回答1: The corresponding puppeteer method is page.waitForSelector. Example: await page.waitForSelector('.product-form__add-to-cart') But if you need to reload the page to get the desired element (maybe because the site you are visiting can look different between visits) than

TypeError [ERR_INVALID_ARG_TYPE]: The “original” argument must be of type function puppeteer node js

末鹿安然 提交于 2020-07-10 03:24:05
问题 I am stuck here for a while I didn't understand the problem. Kindly someone enlightens me on this topic. here's the code. const puppeteer = require('puppeteer'); (async() => { let infourl = 'https://www.imdb.com/title/tt0111161/?ref_=fn_al_tt_3'; let browser = await puppeteer.launch(); let page = await browser.newPage(); await page.goto(infourl, { waitUntil:'networkidle2' }); let data = await page.evaluate( () =>{ let stats = document.querySelector('div[class="title_wrapper"]').innerText;

How to get children of elements by Puppeteer

一个人想着一个人 提交于 2020-07-09 15:45:26
问题 I understand that puppeteer get its own handles rather than standard DOM elements, but I don't understand why I cannot continue the same query by found elements as const els = await page.$$('div.parent'); for (let i = 0; i < els.length; i++) { const img = await els[i].$('img').getAttribute('src'); console.log(img); const link = await els[i].$('a').getAttribute('href'); console.log(link); } 回答1: Problem The element handles are necessary as an abstraction layer between the Node.js and browser

How to get children of elements by Puppeteer

99封情书 提交于 2020-07-09 15:42:51
问题 I understand that puppeteer get its own handles rather than standard DOM elements, but I don't understand why I cannot continue the same query by found elements as const els = await page.$$('div.parent'); for (let i = 0; i < els.length; i++) { const img = await els[i].$('img').getAttribute('src'); console.log(img); const link = await els[i].$('a').getAttribute('href'); console.log(link); } 回答1: Problem The element handles are necessary as an abstraction layer between the Node.js and browser

How to get children of elements by Puppeteer

你离开我真会死。 提交于 2020-07-09 15:42:20
问题 I understand that puppeteer get its own handles rather than standard DOM elements, but I don't understand why I cannot continue the same query by found elements as const els = await page.$$('div.parent'); for (let i = 0; i < els.length; i++) { const img = await els[i].$('img').getAttribute('src'); console.log(img); const link = await els[i].$('a').getAttribute('href'); console.log(link); } 回答1: Problem The element handles are necessary as an abstraction layer between the Node.js and browser

How to get children of elements by Puppeteer

允我心安 提交于 2020-07-09 15:42:07
问题 I understand that puppeteer get its own handles rather than standard DOM elements, but I don't understand why I cannot continue the same query by found elements as const els = await page.$$('div.parent'); for (let i = 0; i < els.length; i++) { const img = await els[i].$('img').getAttribute('src'); console.log(img); const link = await els[i].$('a').getAttribute('href'); console.log(link); } 回答1: Problem The element handles are necessary as an abstraction layer between the Node.js and browser

Puppeteer: Chromium instances remain active in the background after browser.disconnect

随声附和 提交于 2020-07-09 12:51:27
问题 My environment Puppeteer version: 3.1.0 Platform / OS version: Windows 10 Node.js version: 12.16.1 My problem is: I have a for...of loop to visit 3000+ urls with puppeteer. I use puppeteer.connect to wsEndpoint so I can reuse one browser instance. I disconnect after each visit and close the tab. first 100 urls page.goto 's open the urls immediately, above 100 page.goto uses 2-3 retries per url, above 300 page.goto uses 5-8 retries per url, above 500 I get TimeoutError: Navigation timeout of

Tell Puppeteer to open Chrome tab instead of window

て烟熏妆下的殇ゞ 提交于 2020-07-06 20:33:16
问题 If I have an existing Google Chrome window open, I'd like to tell puppeteer to open a new tab instead of opening a new window. Is there a way to do that? is there some option or flag I can pass to puppeteer to accomplish this? I have: const puppeteer = require('puppeteer'); (async function () { const b = await puppeteer.launch({ devtools: true, openInExistingWindow: true /// ? something like this? }); const page = await b.newPage(); await page.goto('https://example.com'); })(); 回答1: const