Puppeteer | 易学教程

Node JS Puppteer Infinite scroll loop

阅读更多关于 Node JS Puppteer Infinite scroll loop

问题 I am learning Puppeteer & trying to scrape a website that has infinite scroll implemented. I am able to get all the Prices from the list, by scrolling down after a delay of 1 second. Here is the URL What I want to do is, open a item from the list, get the product name, go back to the list, select the second product and do this for all products. const fs = require('fs'); const puppeteer = require('puppeteer'); function extractItems() { const extractedElements = document.querySelectorAll('

Puppeteer: Grabbing entire html from page that uses lazy load

阅读更多关于 Puppeteer: Grabbing entire html from page that uses lazy load

I am trying to grab the entire html on a web page that uses lazy load. What I have tried is scrolling all the way to the bottom and then use page.content(). I have also tried scrolling back to the top of the page after I scrolled to the bottom and then use page.content(). Both ways grabs some rows of the table, but not all of them, which is my main goal. I believe that the web page uses lazy loading from react.js. const puppeteer = require('puppeteer'); const url = 'https://www.torontopearson.com/en/departures'; const fs = require('fs'); puppeteer.launch().then(async browser => { const page =

Trouble clicking on different links using puppeteer

阅读更多关于 Trouble clicking on different links using puppeteer

问题 I've written tiny scripts in node using puppeteer to perform clicks cyclically on the link of different posts from it's landing page of a website . The site link used within my scripts is a placeholder. Moreover, they are not dynamic. So, puppeteer might be overkill. However, My intention is to learn the logic of clicking. When I execute my first script, It clicks once and throws the following error as it goes out of the source. const puppeteer = require("puppeteer"); (async () => { const

node js puppeteer metadata

阅读更多关于 node js puppeteer metadata

问题 I am new to Puppeteer, and I am trying to extract meta data from a Web site using Node.JS and Puppeteer. I just can't seem to get the syntax right. The code below works perfectly extracting the Title tag, using two different methods, as well as text from a paragraph tag. How would I extract the content text for the meta data with the name of "description" for example? meta name="description" content="Stack Overflow is the largest, etc" I would be seriously grateful for any suggestions! I can

Is there a way to get puppeteer's waitUntil “networkidle” to only consider XHR (ajax) requests?

阅读更多关于 Is there a way to get puppeteer's waitUntil “networkidle” to only consider XHR (ajax) requests?

问题 I am using puppeteer to evaluate the javascript-based HTML of web pages in my test app. This is the line I am using to make sure all the data is loaded: await page.setRequestInterception(true); page.on("request", (request) => { if (request.resourceType() === "image" || request.resourceType() === "font" || request.resourceType() === "media") { console.log("Request intercepted! ", request.url(), request.resourceType()); request.abort(); } else { request.continue(); } }); try { await page.goto

How to make Puppeteer work with a ReactJS application on the client-side

阅读更多关于 How to make Puppeteer work with a ReactJS application on the client-side

问题 I am fairly new to React and I am developing an app which will take actual screenshots of a web page and the app can draw and add doodles on top of the screenshot taken. I initially used html2canvas and domToImage to take client-side screenshots but it doesn't render the image exactly as it is shown in the web page. Reddit user /pamblam0 suggested I look into Google's Puppeteer. How it works is that it launches a headless chromium browser which goes to my react app on localhost then gets a

How can I disable webRTC local IP leak with puppeteer?

阅读更多关于 How can I disable webRTC local IP leak with puppeteer?

I tried: const browser = await puppeteer.launch({args: ['--enable-webrtc-stun-origin=false', '--enforce-webrtc-ip-permission-check=false']}); But this is not working. Next I tried: const targets = await browser.targets(); const backgroundPageTarget = targets.find(target => target.type() === 'background_page'); const backgroundPage = await backgroundPageTarget.page(); await backgroundPage.evaluateevaluateOnNewDocument(() => { chrome.privacy.network.webRTCIPHandlingPolicy.set({ value: "default_public_interface_only" }); }); But got: TypeError: Cannot read property 'page' of undefined EDIT: Need

JS Puppeteer wait for page load to complete

阅读更多关于 JS Puppeteer wait for page load to complete

After seeing this youtube video using puppeteer I got inspired to play a bit around with it. But I seem to have made the wrong choice of a website as a starter project. const puppeteer = require('puppeteer') ;(async () => { const browser = await puppeteer.launch() const page = await browser.newPage() await page.goto('http://www.produktresume.dk/AppBuilder/search?page=0') page.once('load', () => { const drugs = page .evaluate(() => [...document.querySelectorAll('div.entity-link')].map(item => item) ) .catch(err => console.log(err)) console.log(drugs[0]) }) await browser.close() })() I have

Puppeteer: How to get the contents of each element of a nodelist?

阅读更多关于 Puppeteer: How to get the contents of each element of a nodelist?

问题 I'm trying to achieve something very trivial: Get a list of elements, and then do something with the innerText of each element. const tweets = await page.$$('.tweet'); From what I can tell, this returns a nodelist, just like the document.querySelectorAll() method in the browser. How do I just loop over it and get what I need? I tried various stuff, like: [...tweets].forEach(tweet => { console.log(tweet.innerText) }); 回答1: page.$$(): You can use a combination of elementHandle.getProperty() and

How to set value of select with node Puppeteer

阅读更多关于 How to set value of select with node Puppeteer

问题 I am trying to do some automation with the rather new GoogleChrome/puppeteer library, but I cannot figure out how to set a value in a select field. Here is my (simplified) function to set the value of a text input: async function setInputVal(sel, text) { await page.focus(sel) page.press('Backspace') page.type(text) } await setInputVal('input.searchjob', task.id) I cant figure out how to do the same for a select field. I have tried to set the focus, insert script and execute but I cannot get