Puppeteer

Puppeteer is unable to get the complete source code

做~自己de王妃 提交于 2019-12-11 00:08:07
问题 I'm creating a simple scraping application with Node.js and Puppeteer. The page I'm trying to scrape is this. Below is the code I'm using right now. const url = `https://www.betrebels.gr/el/sports?catids=122,40,87,28,45,2&champids=423,274616,1496978,1484069,1484383,465990,465991,91,71,287,488038,488076,488075,1483480,201,2,367,38,1481454,18,226,440,441,442,443,444,445,446,447,448,449,451,452,453,456,457,458,459,460,278261&datefilter=TodayTomorrow&page=prelive` await page.goto(url, {waitUntil:

Auto allow webcam access using Puppeteer for Node.js

隐身守侯 提交于 2019-12-10 20:00:02
问题 I'm setting up a test that involves starting a webcam video session. So far all is working fine and doesn't require any user interaction except for granting access to the webcam. When the third party library I'm using makes the call: navigator.mediaDevices.getUserMedia({audio: true, video: true}) the browser opens a prompt asking the user to allow access. What I'm looking for is a way to grant access without user interaction. I've tried puppeteer's page.on('dialog'... but that doesn't get

Async throwing SyntaxError: Unexpected token (

こ雲淡風輕ζ 提交于 2019-12-10 18:39:42
问题 I'm running a test using the headless Chrome package Puppeteer: const puppeteer = require('puppeteer') ;(async() => { const browser = await puppeteer.launch() const page = await browser.newPage() await page.goto('https://google.com', {waitUntil: 'networkidle'}) // Type our query into the search bar await page.type('puppeteer') await page.click('input[type="submit"]') // Wait for the results to show up await page.waitForSelector('h3 a') // Extract the results from the page const links = await

How to recreate a page with all of the cookies?

谁说我不能喝 提交于 2019-12-10 18:13:42
问题 I am trying to: Visit a page that initialises a session Store the session in a JSON object Visit the same page, which now should recognise the existing session The implementation I have attempted is as follows: import puppeteer from 'puppeteer'; const createSession = async (browser, startUrl) => { const page = await browser.newPage(); await page.goto(startUrl); await page.waitForSelector('#submit'); const cookies = await page.cookies(); const url = await page.url(); return { cookies, url }; }

How to reload page in Puppeteer?

送分小仙女□ 提交于 2019-12-10 17:54:49
问题 I would like to reload the page whenever the page doesn't load properly or encounters a problem. I tried page.reload() but it doesn't work. for(const sect of sections ){ // Now collect all the URLs const appUrls = await page.$$eval('div.main > ul.app-list > li > div.app-info a.app-info-icon', links => links.map(link => link.href)); // Visit each URL one by one and collect the data for (let appUrl of appUrls) { var count = i++; try{ await page.goto(appUrl); const appName = await page.$eval(

Get all visible plain text and find out which HTML tag or DOM element each piece of text belongs to

感情迁移 提交于 2019-12-10 17:44:19
问题 I know how to get all visible plain text on a page: const text = await page.$eval('*', el => el.innerText); But I also need to know which element of the page each piece of text belongs to, and I can't find a way to do that. 回答1: On the client side, you can do this in a way that preserves order using TreeWalker. Here’s an example with sample content from Web Scraper Testing Ground: const IGNORE = ["style", "script"]; const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT)

How to get JavaScript object in JavaScript code?

泪湿孤枕 提交于 2019-12-10 17:10:31
问题 TL;DR I want parseParameter that parse JSON like the following code. someCrawledJSCode is crawled JavaScript code. const data = parseParameter(someCrawledJSCode); console.log(data); // data1: {...} Problem I'm crawling some JavaScript code with puppeteer and I want to extract a JSON object from it, but I don't know how to parse the given JavaScript code. Crawled JavaScript Code Example: const somecode = 'somevalue'; arr.push({ data1: { prices: [{ prop1: 'hi', prop2: 'hello', }, { prop1: 'foo'

Puppeteer chrome get active/visible tab

我是研究僧i 提交于 2019-12-10 16:34:54
问题 In a chrome extension you can use below to find the active tab in a window chrome.tabs.query({ currentWindow: true, active: true, } I have a below code which connects to existing browser and get all the pages. I am not able to make out if there is a way for me to know which tab/page is currently the active one and get its url ( page.url() , but which one from the the array to use?) const puppeteer = require('puppeteer'); debuggerUrl = "http://127.0.0.1:9999/json/version" const request =

Can't get test coverage with jest + puppeteer

时间秒杀一切 提交于 2019-12-10 15:28:04
问题 I have project Excellent.js setup for automatic testing with jest and puppeteer, which successfully runs all the tests, which can be seen on Travis CI. But after a lot of configuration tweaks I have been unable to make it report correct coverage. No matter what tests are executed, the coverage does not reflect it at all. The library contains only a single JavaScript file excellent.js , and my jest.config.js was set up as instructed for coverage: module.exports = { collectCoverage: true,

Puppeteer: Grabbing entire html from page that uses lazy load

断了今生、忘了曾经 提交于 2019-12-10 11:36:00
问题 I am trying to grab the entire html on a web page that uses lazy load. What I have tried is scrolling all the way to the bottom and then use page.content(). I have also tried scrolling back to the top of the page after I scrolled to the bottom and then use page.content(). Both ways grabs some rows of the table, but not all of them, which is my main goal. I believe that the web page uses lazy loading from react.js. const puppeteer = require('puppeteer'); const url = 'https://www.torontopearson