Puppeteer | 易学教程

Puppeteer is unable to get the complete source code

阅读更多关于 Puppeteer is unable to get the complete source code

问题 I'm creating a simple scraping application with Node.js and Puppeteer. The page I'm trying to scrape is this. Below is the code I'm using right now. const url = `https://www.betrebels.gr/el/sports?catids=122,40,87,28,45,2&champids=423,274616,1496978,1484069,1484383,465990,465991,91,71,287,488038,488076,488075,1483480,201,2,367,38,1481454,18,226,440,441,442,443,444,445,446,447,448,449,451,452,453,456,457,458,459,460,278261&datefilter=TodayTomorrow&page=prelive` await page.goto(url, {waitUntil:

Auto allow webcam access using Puppeteer for Node.js

阅读更多关于 Auto allow webcam access using Puppeteer for Node.js

问题 I'm setting up a test that involves starting a webcam video session. So far all is working fine and doesn't require any user interaction except for granting access to the webcam. When the third party library I'm using makes the call: navigator.mediaDevices.getUserMedia({audio: true, video: true}) the browser opens a prompt asking the user to allow access. What I'm looking for is a way to grant access without user interaction. I've tried puppeteer's page.on('dialog'... but that doesn't get

Async throwing SyntaxError: Unexpected token (

阅读更多关于 Async throwing SyntaxError: Unexpected token (

问题 I'm running a test using the headless Chrome package Puppeteer: const puppeteer = require('puppeteer') ;(async() => { const browser = await puppeteer.launch() const page = await browser.newPage() await page.goto('https://google.com', {waitUntil: 'networkidle'}) // Type our query into the search bar await page.type('puppeteer') await page.click('input[type="submit"]') // Wait for the results to show up await page.waitForSelector('h3 a') // Extract the results from the page const links = await

How to recreate a page with all of the cookies?

阅读更多关于 How to recreate a page with all of the cookies?

问题 I am trying to: Visit a page that initialises a session Store the session in a JSON object Visit the same page, which now should recognise the existing session The implementation I have attempted is as follows: import puppeteer from 'puppeteer'; const createSession = async (browser, startUrl) => { const page = await browser.newPage(); await page.goto(startUrl); await page.waitForSelector('#submit'); const cookies = await page.cookies(); const url = await page.url(); return { cookies, url }; }

How to reload page in Puppeteer?

阅读更多关于 How to reload page in Puppeteer?

问题 I would like to reload the page whenever the page doesn't load properly or encounters a problem. I tried page.reload() but it doesn't work. for(const sect of sections ){ // Now collect all the URLs const appUrls = await page.$$eval('div.main > ul.app-list > li > div.app-info a.app-info-icon', links => links.map(link => link.href)); // Visit each URL one by one and collect the data for (let appUrl of appUrls) { var count = i++; try{ await page.goto(appUrl); const appName = await page.$eval(

Get all visible plain text and find out which HTML tag or DOM element each piece of text belongs to

阅读更多关于 Get all visible plain text and find out which HTML tag or DOM element each piece of text belongs to

问题 I know how to get all visible plain text on a page: const text = await page.$eval('*', el => el.innerText); But I also need to know which element of the page each piece of text belongs to, and I can't find a way to do that. 回答1: On the client side, you can do this in a way that preserves order using TreeWalker. Here’s an example with sample content from Web Scraper Testing Ground: const IGNORE = ["style", "script"]; const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT)

How to get JavaScript object in JavaScript code?

阅读更多关于 How to get JavaScript object in JavaScript code?

问题 TL;DR I want parseParameter that parse JSON like the following code. someCrawledJSCode is crawled JavaScript code. const data = parseParameter(someCrawledJSCode); console.log(data); // data1: {...} Problem I'm crawling some JavaScript code with puppeteer and I want to extract a JSON object from it, but I don't know how to parse the given JavaScript code. Crawled JavaScript Code Example: const somecode = 'somevalue'; arr.push({ data1: { prices: [{ prop1: 'hi', prop2: 'hello', }, { prop1: 'foo'

Puppeteer chrome get active/visible tab

阅读更多关于 Puppeteer chrome get active/visible tab

问题 In a chrome extension you can use below to find the active tab in a window chrome.tabs.query({ currentWindow: true, active: true, } I have a below code which connects to existing browser and get all the pages. I am not able to make out if there is a way for me to know which tab/page is currently the active one and get its url ( page.url() , but which one from the the array to use?) const puppeteer = require('puppeteer'); debuggerUrl = "http://127.0.0.1:9999/json/version" const request =

Can't get test coverage with jest + puppeteer

阅读更多关于 Can't get test coverage with jest + puppeteer

问题 I have project Excellent.js setup for automatic testing with jest and puppeteer, which successfully runs all the tests, which can be seen on Travis CI. But after a lot of configuration tweaks I have been unable to make it report correct coverage. No matter what tests are executed, the coverage does not reflect it at all. The library contains only a single JavaScript file excellent.js , and my jest.config.js was set up as instructed for coverage: module.exports = { collectCoverage: true,

Puppeteer: Grabbing entire html from page that uses lazy load

阅读更多关于 Puppeteer: Grabbing entire html from page that uses lazy load

问题 I am trying to grab the entire html on a web page that uses lazy load. What I have tried is scrolling all the way to the bottom and then use page.content(). I have also tried scrolling back to the top of the page after I scrolled to the bottom and then use page.content(). Both ways grabs some rows of the table, but not all of them, which is my main goal. I believe that the web page uses lazy loading from react.js. const puppeteer = require('puppeteer'); const url = 'https://www.torontopearson