Puppeteer

how to handle elements that load after ajax request in puppeteer

无人久伴 提交于 2019-12-01 10:48:49
I'm trying to do web scraping using puppeteer. The element I need to handle loads lately. When I click on the search button the result loads in AJAX and I need to pick the element I am trying to pick is in the search results but not in the initial load of the page. The page screenshot it is producing contains search results too and if it output the HTML source I can see the element there too. but not sure why I cannot pick it. You can use await page.waitForSelector(cssSelector); to ask Puppeteer to wait for any element to be displayed in the UI before continuing on to further steps in your

how to handle elements that load after ajax request in puppeteer

*爱你&永不变心* 提交于 2019-12-01 08:54:40
问题 I'm trying to do web scraping using puppeteer. The element I need to handle loads lately. When I click on the search button the result loads in AJAX and I need to pick the element I am trying to pick is in the search results but not in the initial load of the page. The page screenshot it is producing contains search results too and if it output the HTML source I can see the element there too. but not sure why I cannot pick it. 回答1: You can use await page.waitForSelector(cssSelector); to ask

Puppeteer Execution context was destroyed, most likely because of a navigation

时光总嘲笑我的痴心妄想 提交于 2019-12-01 08:46:17
I am facing this problem in puppeteer in a for loop when i go on another page to get data, then when i go back it comes me this error line: Error "We have an error Error: the execution context was destroyed, probably because of a navigation." It's a directory page that contains 15 companies per page and then I want to visit each company to get information. try { const browser = await pupputer.launch({ headless: false, devtools: true, defaultViewport: { width: 1100, height: 1000 } }); const page = await browser.newPage(); await page.goto('MyLink'); await page.waitForSelector('.list-firms'); for

How to pass required module object to puppeteer page.evaluate

拥有回忆 提交于 2019-12-01 07:26:39
问题 Puppeteer version: 1.0.0 Platform / OS version: Windows 10 Node.js version: 8.9.3 Here is my code: const puppeteer = require('puppeteer'); const varname = require('varname'); ... const page = await browser.newPage(); await page.goto(url); let generalInfo = await page.evaluate(() => { let elements = Array.from(document.querySelectorAll('#order-details > table > tbody > tr')); let res = {}; elements.map((tr) => { let split = tr.innerText.trim().split('\t'); res[varname.camelback(split[0])] =

Puppeteer unable to run on heroku

独自空忆成欢 提交于 2019-12-01 06:24:58
问题 I deployed an app on heroku, and i added the puppeteer Heroku buildpack. After a succesful redeployment, i try to run it and it fails. Using heroku logs -t , i get this error message: 2018-09-07T13:16:10.870497+00:00 app[web.1]: Error: Failed to launch chrome! 2018-09-07T13:16:10.870512+00:00 app[web.1]: [0907/131610.045486:FATAL:zygote_ho st_impl_linux.cc(116)] No usable sandbox! Update your kernel or see https://chro mium.googlesource.com/chromium/src/+/master/docs/linux_suid_sandbox

page.evaluate Vs. Puppeteer $ methods

前提是你 提交于 2019-12-01 06:06:29
问题 I'm interested in the differences of these two blocks of code. const $anchor = await page.$('a.buy-now'); const link = await $anchor.getProperty('href'); await $anchor.click(); await page.evaluate(() => { const $anchor = document.querySelector('a.buy-now'); const text = $anchor.href; $anchor.click(); }); I've generally found raw DOM elements in page.evaluate() easier to work and the ElementHandles returned by the $ methods an abstraction to far. However I felt perhaps that the async Puppeteer

实例:使用puppeteer headless方式抓取JS网页

允我心安 提交于 2019-12-01 06:05:26
puppeteer google chrome团队出品的puppeteer 是依赖nodejs和chromium的自动化测试库,它的最大优点就是可以处理网页中的 动态内容 ,如JavaScript,能够更好的模拟用户。 有些网站的反爬虫手段是将部分内容隐藏于某些javascript/ajax请求中,致使直接获取a标签的方式不奏效。甚至有些网站会设置隐藏元素“陷阱”,对用户不可见,脚本触发则认为是机器。这种情况下,puppeteer的优势就凸显出来了。 它可实现如下功能: 生成页面的屏幕截图和PDF。 抓取SPA并生成预先呈现的内容(即“SSR”)。 自动表单提交,UI测试,键盘输入等。 创建一个最新的自动化测试环境。使用最新的JavaScript和浏览器功能,直接在最新版本的Chrome中运行测试。 捕获跟踪您网站的时间线,以帮助诊断性能问题。 开源地址:[ https://github.com/GoogleChrome/puppeteer/ ][1] 安装 npm i puppeteer 注意 先安装nodejs, 并在nodejs文件根目录下执行(npm文件同级)。 安装过程中会下载chromium,大约120M。 用两天(大约10小时)摸索,绕过了相当多的异步的坑,笔者对puppeteer和nodejs有了一定的掌握。 一张长图,抓取blog文章列表: 抓取blog文章

Realtime scrap a chat using Nodejs

可紊 提交于 2019-12-01 05:40:01
问题 What I want to do is to build a scrap application on NodeJs from which it m onitors on Realtime a chat and store certain messages within any database? What I am wanting to do is the following, I am wanting to capture data from the chat platforms streaming, and thus capture some useful information that helps those who are doing the streaming service; But I do not know how to start doing this using NodeJs, What I have been able to do so far has been to capture the data of the messages, however

Puppeteer Execution context was destroyed, most likely because of a navigation

空扰寡人 提交于 2019-12-01 04:39:17
问题 I am facing this problem in puppeteer in a for loop when i go on another page to get data, then when i go back it comes me this error line: Error "We have an error Error: the execution context was destroyed, probably because of a navigation." It's a directory page that contains 15 companies per page and then I want to visit each company to get information. try { const browser = await pupputer.launch({ headless: false, devtools: true, defaultViewport: { width: 1100, height: 1000 } }); const

How to switch between tabs with Puppeteer?

我是研究僧i 提交于 2019-12-01 04:09:38
Here is my use case: I have a link which on clicking opens a new tab and loads the content. What I am looking for: Is there a way to switch the reference of the page while new tab opens or create a reference for the new tab? Use next function: let clickAndWaitForTarget = async (clickSelector, page, browser) => { const pageTarget = page.target(); //save this to know that this was the opener await page.click(clickSelector); //click on a link const newTarget = await browser.waitForTarget(target => target.opener() === pageTarget); //check that you opened this page, rather than just checking the