Puppeteer

Node JS Puppteer Infinite scroll loop

旧城冷巷雨未停 提交于 2019-12-08 12:00:20
问题 I am learning Puppeteer & trying to scrape a website that has infinite scroll implemented. I am able to get all the Prices from the list, by scrolling down after a delay of 1 second. Here is the URL What I want to do is, open a item from the list, get the product name, go back to the list, select the second product and do this for all products. const fs = require('fs'); const puppeteer = require('puppeteer'); function extractItems() { const extractedElements = document.querySelectorAll('

Puppeteer: Grabbing entire html from page that uses lazy load

隐身守侯 提交于 2019-12-08 03:35:30
I am trying to grab the entire html on a web page that uses lazy load. What I have tried is scrolling all the way to the bottom and then use page.content(). I have also tried scrolling back to the top of the page after I scrolled to the bottom and then use page.content(). Both ways grabs some rows of the table, but not all of them, which is my main goal. I believe that the web page uses lazy loading from react.js. const puppeteer = require('puppeteer'); const url = 'https://www.torontopearson.com/en/departures'; const fs = require('fs'); puppeteer.launch().then(async browser => { const page =

Trouble clicking on different links using puppeteer

旧城冷巷雨未停 提交于 2019-12-08 03:27:52
问题 I've written tiny scripts in node using puppeteer to perform clicks cyclically on the link of different posts from it's landing page of a website . The site link used within my scripts is a placeholder. Moreover, they are not dynamic. So, puppeteer might be overkill. However, My intention is to learn the logic of clicking. When I execute my first script, It clicks once and throws the following error as it goes out of the source. const puppeteer = require("puppeteer"); (async () => { const

node js puppeteer metadata

柔情痞子 提交于 2019-12-07 16:37:59
问题 I am new to Puppeteer, and I am trying to extract meta data from a Web site using Node.JS and Puppeteer. I just can't seem to get the syntax right. The code below works perfectly extracting the Title tag, using two different methods, as well as text from a paragraph tag. How would I extract the content text for the meta data with the name of "description" for example? meta name="description" content="Stack Overflow is the largest, etc" I would be seriously grateful for any suggestions! I can

Is there a way to get puppeteer's waitUntil “networkidle” to only consider XHR (ajax) requests?

人走茶凉 提交于 2019-12-07 12:29:34
问题 I am using puppeteer to evaluate the javascript-based HTML of web pages in my test app. This is the line I am using to make sure all the data is loaded: await page.setRequestInterception(true); page.on("request", (request) => { if (request.resourceType() === "image" || request.resourceType() === "font" || request.resourceType() === "media") { console.log("Request intercepted! ", request.url(), request.resourceType()); request.abort(); } else { request.continue(); } }); try { await page.goto

How to make Puppeteer work with a ReactJS application on the client-side

拜拜、爱过 提交于 2019-12-07 09:28:38
问题 I am fairly new to React and I am developing an app which will take actual screenshots of a web page and the app can draw and add doodles on top of the screenshot taken. I initially used html2canvas and domToImage to take client-side screenshots but it doesn't render the image exactly as it is shown in the web page. Reddit user /pamblam0 suggested I look into Google's Puppeteer. How it works is that it launches a headless chromium browser which goes to my react app on localhost then gets a

How can I disable webRTC local IP leak with puppeteer?

流过昼夜 提交于 2019-12-07 08:20:30
I tried: const browser = await puppeteer.launch({args: ['--enable-webrtc-stun-origin=false', '--enforce-webrtc-ip-permission-check=false']}); But this is not working. Next I tried: const targets = await browser.targets(); const backgroundPageTarget = targets.find(target => target.type() === 'background_page'); const backgroundPage = await backgroundPageTarget.page(); await backgroundPage.evaluateevaluateOnNewDocument(() => { chrome.privacy.network.webRTCIPHandlingPolicy.set({ value: "default_public_interface_only" }); }); But got: TypeError: Cannot read property 'page' of undefined EDIT: Need

JS Puppeteer wait for page load to complete

点点圈 提交于 2019-12-07 08:14:26
After seeing this youtube video using puppeteer I got inspired to play a bit around with it. But I seem to have made the wrong choice of a website as a starter project. const puppeteer = require('puppeteer') ;(async () => { const browser = await puppeteer.launch() const page = await browser.newPage() await page.goto('http://www.produktresume.dk/AppBuilder/search?page=0') page.once('load', () => { const drugs = page .evaluate(() => [...document.querySelectorAll('div.entity-link')].map(item => item) ) .catch(err => console.log(err)) console.log(drugs[0]) }) await browser.close() })() I have

Puppeteer: How to get the contents of each element of a nodelist?

懵懂的女人 提交于 2019-12-07 07:23:47
问题 I'm trying to achieve something very trivial: Get a list of elements, and then do something with the innerText of each element. const tweets = await page.$$('.tweet'); From what I can tell, this returns a nodelist, just like the document.querySelectorAll() method in the browser. How do I just loop over it and get what I need? I tried various stuff, like: [...tweets].forEach(tweet => { console.log(tweet.innerText) }); 回答1: page.$$(): You can use a combination of elementHandle.getProperty() and

How to set value of select with node Puppeteer

限于喜欢 提交于 2019-12-07 06:47:45
问题 I am trying to do some automation with the rather new GoogleChrome/puppeteer library, but I cannot figure out how to set a value in a select field. Here is my (simplified) function to set the value of a text input: async function setInputVal(sel, text) { await page.focus(sel) page.press('Backspace') page.type(text) } await setInputVal('input.searchjob', task.id) I cant figure out how to do the same for a select field. I have tried to set the focus, insert script and execute but I cannot get