Puppeteer

Is it safe to run multiple instances of Puppeteer at the same time?

半腔热情 提交于 2019-12-03 04:11:22
Is it safe/supported to run multiple instances of Puppeteer at the same time, either at the process level (multiple node screenshot.js at the same time) or at the script level (multiple puppeteer.launch() at the same time)? What are the recommended settings/limits on parallel processes? (In my tests, (1) seems to work fine, but I'm wondering about the reliability of Puppeteer's interactions with the single (?) instance of Chrome. I haven't tried (2) but that seems less likely to work out.) It's fine to run multiple browser, contexts or even pages in parallel. The limits depend on your network

Collect elements by class name and then click each one - Puppeteer

混江龙づ霸主 提交于 2019-12-03 02:44:43
Using Puppeteer, I would like to get all the elements on a page with a particular class name and then loop through and click each one Using jQuery I can achieve this with var elements = $("a.showGoals").toArray(); for (i = 0; i < elements.length; i++) { $(elements[i]).click(); } How would I achieve this using Puppeteer? Update Tried out Chridam's answer below but I couldn't get to work (though answer helpful so thanks due there) so I tried the following and this works await page.evaluate(() => { let elements = $('a.showGoals').toArray(); for (i = 0; i < elements.length; i++) { $(elements[i])

puppeteer : wait an element is visible?

半城伤御伤魂 提交于 2019-12-03 01:15:17
I would like to know if I can tell puppeteer to wait until an element in displayed. const inputValidate = await page.$('input[value=validate]'); await inputValidate.click() //I want to do something like that waitElemenentVisble('.btnNext ') const btnNext = await page.$('.btnNext'); await btnNext.click(); Is there any way I can accomplish this? turmuka I think you can use page.waitForSelector(selector[, options]) function for that purpose. const puppeteer = require('puppeteer'); puppeteer.launch().then(async browser => { const page = await browser.newPage(); page .waitForSelector('#myId') .then

Websocket communication with multiple Chrome Docker containers

我的未来我决定 提交于 2019-12-02 23:37:09
I have a Chrome container (deployed using this Dockerfile ) that renders pages on request from an App container. The basic flow is: App sends an http request to Chrome and in response receives a websocket url to use (e.g. ws://chrome.example.com:9222/devtools/browser/13400ef6-648b-4618-8e4c-b5c73db2a122 ) App then uses that websocket url to communicate further with Chrome, and to receive the rendered page. I am using the puppeteer library to connect to and communicate with the Chrome instance, using puppeteer.connect({ browserWSEndpoint: webSocketUrl }) ; For a single Chrome container this

使用Puppeteer撸一个爬虫

自作多情 提交于 2019-12-02 20:21:51
Puppeteer是什么 puppeteer是谷歌chrome团队官方开发的一个无界面(Headless)chrome工具。Chrome Headless将成为web应用自动化测试的行业标杆。所以我们很有必要来了解一下它。所谓的无头浏览器是指没有窗口的浏览器。 Puppeteer能做什么 使用puppeteer我们可以让浏览器帮我们自动完成很多事情,比如创建页面的截图或pdf,自动提交表单,UI测试,键盘测试,创建自动化测试环境等。 安装Puppeteer 通过 npm install puppeteer 或者 yarn add puppeteer来安装 注意事项 因为用到了es7的async/await语法进行异步处理。所以node版本最好是v7.6.0或以上。 爬虫实例 对puppeteer有些简单认识之后,我们来做一个爬虫实例加深对puppeteer的印象! 目标网站 如上图所示,我们这次的爬取目标是豆瓣电影网分类区按时间排序,评分6-10分的电影信息。最终获取到的数据是每部电影的海报图片,电影名称,评分以及它的id。 代码分析 1,页面中引入puppeteer包 2,声明爬取页面的url 3, 声明一个基于promise的定时器,后续代码中将会用到。 创建一个立即执行的async 方法,后续代码全部书写在async方法中。 通过puppeteer.launch(

Which performs faster, headless browser or Curl?

你说的曾经没有我的故事 提交于 2019-12-02 20:06:59
问题 I need to open around 100,000 URLS per day so that the images and html are cached into Cloudflare as the content changes fairly frequently. I suspect that Curl will probably perform faster than a headless browser (chrome headless via puppeteer) Does anyone have any experience with this or are there better ways of doing it? 回答1: first off, i am confident that libcurl's curl_multi api is significantly faster than a headless browser. even if running under PHP (which is a much slower language

Detect and test Chrome Extension using Puppeteer

℡╲_俬逩灬. 提交于 2019-12-02 19:33:22
Is there a way to test a Chrome extension using Puppeteer? For example can an extension detect that Chrome was launched in "test" mode to provide different UI, check content scripts are working, etc? ebidel Passing --user-agent in puppeteer.launch() is a useful way to override the browser's UA with a custom value. Then, your extension can read back navigator.userAgent in its background page and identify that Chrome was launched with Puppeteer. At that point, you can provide different code paths for testing the crx vs. normal operation. puppeteer_script.js const puppeteer = require('puppeteer')

Using Puppeteer to click main links and clicking sub-links?

纵然是瞬间 提交于 2019-12-02 15:34:04
问题 Simplification : I have a website with links. After clicking on each link , it goes to a new page that I need to visit the links ( by clicking , not navigating). Visualization : I've managed to do 99% percent of the job: (async () => { const browser = await puppeteer.launch({headless: false}); const page = await browser.newPage(); let url = "https://www.mutualart.com/Artists"; console.log(`Fetching page data for : ${url}...`); await page.goto(url); await page.waitForSelector(".item.col-xs-3")

Puppeteer - page.$$('').length returns undefined

对着背影说爱祢 提交于 2019-12-02 12:36:41
问题 I was having errors with my code, so i tried to log the value in the erroneous code. So i did: const read = await page.$$('.Ns6lhs9 _gfh3').length; Then i console.log(read); For some reason i get undefined although there are elements with class name 'Ns6lhs9 _gfh3' in the HTML 回答1: $$ returns a promise of an element, while length is not a promise, it's actual value. It should be: const read = (await page.$$('.Ns6lhs9._gfh3')).length; 回答2: I've had a similar issue with getting 0 when counting

Which performs faster, headless browser or Curl?

寵の児 提交于 2019-12-02 10:56:25
I need to open around 100,000 URLS per day so that the images and html are cached into Cloudflare as the content changes fairly frequently. I suspect that Curl will probably perform faster than a headless browser (chrome headless via puppeteer) Does anyone have any experience with this or are there better ways of doing it? first off, i am confident that libcurl's curl_multi api is significantly faster than a headless browser. even if running under PHP (which is a much slower language than say C), i recon it would be faster than a headless browser, but let's put it to the test, benchmarking it