Is it safe to run multiple instances of Puppeteer at the same time?

回眸只為那壹抹淺笑 提交于 2019-12-03 13:08:14

问题


Is it safe/supported to run multiple instances of Puppeteer at the same time, either at

  1. the process level (multiple node screenshot.js at the same time) or
  2. at the script level (multiple puppeteer.launch() at the same time)?

What are the recommended settings/limits on parallel processes?

(In my tests, (1) seems to work fine, but I'm wondering about the reliability of Puppeteer's interactions with the single (?) instance of Chrome. I haven't tried (2) but that seems less likely to work out.)


回答1:


It's fine to run multiple browser, contexts or even pages in parallel. The limits depend on your network/disk/memory and task setup.

I crawled a few million pages and from time to time (in my setup, every ~10,000 pages) puppeteer will crash. Therefore, you should have a way to auto-restart the browser and retry the job.

You might want to check out puppteer-cluster, which takes care of pooling the browser instances, restarting and crash detection/restarting. (Disclaimer: I'm the author)

An example of a creation of a cluster is below:

// create a cluster that handles 10 parallel browsers
const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_BROWSER,
    maxConcurrency: 10,
});

// Queue your jobs (one example)
cluster.queue(async ({ page }) => {
    await page.goto('http://www.wikipedia.org');
    await page.screenshot({path: 'wikipedia.png'});
});

This is just a minimal example. There are many more ways to use the cluster.




回答2:


Each puppeteer.launch() boots a new browser for your script to drive, so it's better to have a script interact with multiple puppeteer.launch calls versus running multiple instances of your script. Even though node is single-threaded, events are sent through WebSockets to the browser, meaning you're benefiting from node's async behavior. Said another way: none of these processes run in serial and instead run in parallel even given the single-threaded nature.

For some background I run a service called browserless (https://browserless.io) that aims to productionalize web-based work. I also maintain a few images on docker here: https://hub.docker.com/r/browserless/chrome/




回答3:


Both will work but second one doesn't really make a sense. Reason for that is because nodeJS i single threaded. So even if process will work it won't be faster of easier to use multiple instances of browser in one process rather than in multiple processes. Best option is to run (1) as you did before, only thing you need to remember is to keep tests self contained.



来源:https://stackoverflow.com/questions/48317725/is-it-safe-to-run-multiple-instances-of-puppeteer-at-the-same-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!