Puppeteer | 易学教程

How does header and footer printing work in Puppeter's page.pdf API?

阅读更多关于 How does header and footer printing work in Puppeter's page.pdf API?

问题 I've noticed a few inconsistencies when trying to use the headerTemplate and footerTemplate options with page.pdf : The DPI for headers and footers seems to be lower (72 vs 96 for the main body, I think). So if I'm trying to match the margins, I have to scale by that. Styles are not shared with the main body so I have to include them in the template. If I try to use a locally stored font, it works on the main body but not in the header/footer even if I include the same CSS in the header

Looping through links(stories) and taking screenshots

阅读更多关于 Looping through links(stories) and taking screenshots

问题 What I'm trying to do here is loop through Storybook stories so I can perform visual regression testing on them: const puppeteer = require('puppeteer'); const { toMatchImageSnapshot } = require('jest-image-snapshot'); expect.extend({ toMatchImageSnapshot }); test('no visual regression for button', async () => { const selector = 'a[href*="?selectedKind=Buttons&selectedStory="]'; const browser = await puppeteer.launch({headless:false, slowMo: 350}); const page = await browser.newPage(); await

Want to scrape table using puppeteer.js. How can I get all rows, iterate through rows and then get “td's” for each row

阅读更多关于 Want to scrape table using puppeteer.js. How can I get all rows, iterate through rows and then get “td's” for each row

问题 I have puppeteer js setup and was able get all rows using let rows = await page.$$eval('#myTable tr', row => row); Now I want for each row to get "td's" and then get inner text from those. Basically I want to do this: var tds = myRow.querySelectorAll("td"); where myRow is a table row, with puppeteer.js 回答1: One way to achieve this is to use evaluate that first gets an array of all the TD's then returns the textContent of each TD const puppeteer = require('puppeteer'); const html = ` <html>

Communicate “out” from Chromium via DevTools protocol

阅读更多关于 Communicate “out” from Chromium via DevTools protocol

问题 I have a page running in a headless Chromium instance, and I'm manipulating it via the DevTools protocol, using the Puppeteer NPM package in Node. I'm injecting a script into the page. At some point, I want the script to call me back and send me some information (via some event exposed by the DevTools protocol or some other means). What is the best way to do this? It'd be great if it can be done using Puppeteer, but I'm not against getting my hands dirty and listening for protocol messages by

puppeteer - how to set download location

阅读更多关于 puppeteer - how to set download location

问题 I was able to successfully download a file with puppeteer, but it was just saving it to my /Downloads folder. I've been looking around and can't find anything in the api or forums to set this location. My downloads are basically just go going to the link: await page.goto(url); 回答1: This is how you can set the download path in latest puppeteer v0.13. await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath: './myAwesomeDownloadFolder'}); The behaviour is

Trouble clicking on different links using puppeteer

阅读更多关于 Trouble clicking on different links using puppeteer

I've written tiny scripts in node using puppeteer to perform clicks cyclically on the link of different posts from it's landing page of a website . The site link used within my scripts is a placeholder. Moreover, they are not dynamic. So, puppeteer might be overkill. However, My intention is to learn the logic of clicking. When I execute my first script, It clicks once and throws the following error as it goes out of the source. const puppeteer = require("puppeteer"); (async () => { const browser = await puppeteer.launch({headless:false}); const [page] = await browser.pages(); await page.goto(

Memory leak in express.js api application

阅读更多关于 Memory leak in express.js api application

I am running an express.js application, which is used as a REST api. One endpoint starts puppeteer and test my website with several procedures. After starting the application and the continuous consumption of the endpoint, my docker container runs out of memory every hour as you can see below. First, I thought I have a memory leak in my puppeteer / headless chrome, but I then I monitored the memory usage from the processes, there isn't and memory leak visible as you can see here: 0.00 Mb COMMAND 384.67 Mb /var/express/node_modules/puppeteer/.local 157.41 Mb node /var/express/bin/www 101.76 Mb

Cloud functions timeout on page.goto()

阅读更多关于 Cloud functions timeout on page.goto()

I run tests with puppeteer in cloud functions. If I run test on local machine all is fine. If I run tests in cloud functions emulator it's fine as well. But when I deploy my function to the cloud all tests stuck on page.goto('https://...') and function fails by timeout, which in my case is 3 minutes. The problem was in puppeteer. I downgraded from the version 1.13.0 to 1.11.0 and now everything works fine. See the discussion here 来源： https://stackoverflow.com/questions/55274130/cloud-functions-timeout-on-page-goto

puppeteer-cluster: queue instead of execute

阅读更多关于 puppeteer-cluster: queue instead of execute

I'm experimenting with Puppeteer Cluster and I just don't understand how to use queuing properly. Can it only be used for calls where you don't wait for a response? I'm using Artillery to fire a bunch of requests simultaneously, but they all fail while only some fail when I have the command execute directly. I've taken the code straight from the examples and replaced execute with queue which I expected to work, except the code doesn't wait for the result. Is there a way to achieve this anyway? So this works: const screen = await cluster.execute(req.query.url); But this breaks: const screen =

Trouble Logging In To Google with Headless Chrome / Puppeteer

阅读更多关于 Trouble Logging In To Google with Headless Chrome / Puppeteer

问题 I'm trying to automate certain tasks for work. We have a portal that requires you to sign in through Google. I've created a Puppeteer instance that navigates to the Google auth page, types in my email and password, then stores the cookies so I can navigate through and manipulate the portal. This works perfectly on my local environment, but I've deployed it to Heroku and Google adds a sign in challenge. After entering the password, I'm given the 'Verify it's you' page that says 'This device