Puppeteer | 易学教程

Looping through an array of links gives navigation timeout error - puppeteer

阅读更多关于 Looping through an array of links gives navigation timeout error - puppeteer

来源： https://stackoverflow.com/questions/63423958/looping-through-an-array-of-links-gives-navigation-timeout-error-puppeteer

puppeteer doesn't open a url without protocol

阅读更多关于 puppeteer doesn't open a url without protocol

来源： https://stackoverflow.com/questions/52090433/puppeteer-doesnt-open-a-url-without-protocol

Puppeteer with Brave browser?

阅读更多关于 Puppeteer with Brave browser?

问题 I'm wondering if it's possible executing a puppeteer script using Brave browser instead of the basic version of chromium. I know that Brave has been developed from chromium, and for that reason you can launch a selenium script using brave, but do you know if it's possible with puppeteer as well? 回答1: Yes, you can use Brave. The only catch is the adblocking doesn't work with headless mode. For the adblocking in headful mode, you need to set/create a profile and point the userDataDir option to

Puppeteer with Brave browser?

阅读更多关于 Puppeteer with Brave browser?

How to “hook in” puppeteer into a running Chrome instance/tab

阅读更多关于 How to “hook in” puppeteer into a running Chrome instance/tab

问题 Is it somehow possible to attach puppeteer to a running Chrome instance (manually started browser) and then takeover control within a tab? I'm assuming that it's eventually related to start the Chrome browser using the --no-sandbox flag but don't know how to continue from there. Thanks for any help 回答1: You can use puppeteer.connect(options) (see here): const puppeteer = require('puppeteer'); const browserWSEndpoint = 'a browser websocket endpoint to connect to'; const browser = await

Getting the sibling of an elementHandle in Puppeteer

阅读更多关于 Getting the sibling of an elementHandle in Puppeteer

问题 I'm doing const last = await page.$('.item:last-child') Now I'd love to get the preceding element based on last. ie const prev = last.$.prev() Any thoughts on how to do this? Thanks! 回答1: You should use previousElementSibling inside evaluateHandle, like this: const prev = await page.evaluateHandle(el => el.previousElementSibling, last); Here is full example: const puppeteer = require('puppeteer'); const html = ` <html> <head></head> <body> <div> <div class="item">item 1</div> <div class="item

How to use puppeteer-core with electron?

阅读更多关于 How to use puppeteer-core with electron?

问题 I got this code from another Stackoverflow Question: import electron from "electron"; import puppeteer from "puppeteer-core"; const delay = (ms: number) => new Promise(resolve => { setTimeout(() => { resolve(); }, ms); }); (async () => { try { const app = await puppeteer.launch({ executablePath: electron, args: ["."], headless: false, }); const pages = await app.pages(); const [page] = pages; await page.setViewport({ width: 1200, height: 700 }); await delay(5000); const image = await page

Puppeteer: How do I download a file using chrome headless browser api?

阅读更多关于 Puppeteer: How do I download a file using chrome headless browser api?

问题 Using Puppeteer, how do I get the headless chrome browser to download a file (or make additional http requests and save the response)? 回答1: You could make a simple request through the window, it should work. npm request As soon as it returns the promise with your response, you could write an express Save function, and store the response. It seems that puppeteer it has this implementation. See here: How to make a request with puppeteer. Have a look over this: Emitted when a page issues a

基于Apify+node+react/vue搭建一个有点意思的爬虫平台

阅读更多关于基于Apify+node+react/vue搭建一个有点意思的爬虫平台

前言熟悉我的朋友可能会知道，我一向是不写热点的。为什么不写呢？是因为我不关注热点吗？其实也不是。有些事件我还是很关注的，也确实有不少想法和观点。但我一直奉行一个原则，就是：要做有生命力的内容。本文介绍的内容来自于笔者之前负责研发的爬虫管理平台 , 专门抽象出了一个相对独立的功能模块为大家讲解如何使用 nodejs 开发专属于自己的爬虫平台.文章涵盖的知识点比较多,包含 nodejs , 爬虫框架 , 父子进程及其通信 , react 和 umi 等知识, 笔者会以尽可能简单的语言向大家一一介绍. 你将收获 Apify 框架介绍和基本使用如何创建父子进程以及父子进程通信使用 javascript 手动实现控制爬虫最大并发数截取整个网页图片的实现方案 nodejs 第三方库和模块的使用使用 umi3 + antd4.0 搭建爬虫前台界面平台预览上图所示的就是我们要实现的爬虫平台, 我们可以输入指定网址来抓取该网站下的数据,并生成整个网页的快照.在抓取完之后我们可以下载数据和图片.网页右边是用户抓取的记录,方便二次利用或者备份. 正文在开始文章之前,我们有必要了解爬虫的一些应用. 我们一般了解的爬虫, 多用来爬取网页数据, 捕获请求信息, 网页截图等,如下图: 当然爬虫的应用远远不止如此,我们还可以利用爬虫库做自动化测试 , 服务端渲染 ,

CukeTest+Puppeteer的Web自动化测试

阅读更多关于 CukeTest+Puppeteer的Web自动化测试

测试页面以百度首页为例，我们用CukeTest+Puppeteer编写功能测试Demo，将上篇讲的相关知识点结合起来练手。 CukeTest官方文档： http://www.cuketest.com/zh-cn/ Puppeteer官方文档： https://zhaoqize.github.io/puppeteer-api-zh_CN/ 一、实例1 功能测试：参数化形式打开多个网页 1、打开CukeTest我们来新建一个空项目，安装Node和Puppeteer，注意（两者版本兼容问题），上文中已提到过的。 2、编辑剧本相关参数 3、编写剧本对应的脚本 4、运行如下图剧本的文本如下 # language: zh-CN 功能: 百度首页打开百度首页 @openPage 场景大纲: 页面打开假如打开百度首页 "<param1>" @pageOne 例子: | param1 | | https://www.baidu.com/ | | https://www.runoob.com/ | @pageTwo 例子: | param1 | | https://www.csdn.net/ | | https://www.cnblogs.com/ | @baiduSearch 场景: 百度首页搜索打开百度首页，搜索 'puppeteer',百度查询并截图保存结果假如打开百度首页