puppeteer-cluster: queue instead of execute

问题

I'm experimenting with Puppeteer Cluster and I just don't understand how to use queuing properly. Can it only be used for calls where you don't wait for a response? I'm using Artillery to fire a bunch of requests simultaneously, but they all fail while only some fail when I have the command execute directly.

I've taken the code straight from the examples and replaced execute with queue which I expected to work, except the code doesn't wait for the result. Is there a way to achieve this anyway?

So this works:

const screen = await cluster.execute(req.query.url);

But this breaks:

const screen = await cluster.queue(req.query.url);

Here's the full example with queue:

const express = require('express');
const app = express();
const { Cluster } = require('puppeteer-cluster');

(async () => {
    const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_CONTEXT,
        maxConcurrency: 2,
    });
    await cluster.task(async ({ page, data: url }) => {
        // make a screenshot
        await page.goto('http://' + url);
        const screen = await page.screenshot();
        return screen;
    });

    // setup server
    app.get('/', async function (req, res) {
        if (!req.query.url) {
            return res.end('Please specify url like this: ?url=example.com');
        }
        try {
            const screen = await cluster.queue(req.query.url);

            // respond with image
            res.writeHead(200, {
                'Content-Type': 'image/jpg',
                'Content-Length': screen.length //variable is undefined here
            });
            res.end(screen);
        } catch (err) {
            // catch error
            res.end('Error: ' + err.message);
        }
    });

    app.listen(3000, function () {
        console.log('Screenshot server listening on port 3000.');
    });
})();

What am I doing wrong here? I'd really like to use queuing because without it every incoming request appears to slow down all the other ones.

回答1:

Author of puppeteer-cluster here.

Quote from the docs:

cluster.queue(..): [...] Be aware that this function only returns a Promise for backward compatibility reasons. This function does not run asynchronously and will immediately return.

cluster.execute(...): [...] Works like Cluster.queue, just that this function returns a Promise which will be resolved after the task is executed. In case an error happens during the execution, this function will reject the Promise with the thrown error. There will be no "taskerror" event fired.

When to use which function:

Use cluster.queue if you want to queue a large number of jobs (e.g. list of URLs). The task function needs to take care of storing the results by printing them to console or storing them into a database.
Use cluster.execute if your task function returns a result. This will still queue the job, so this is like calling queue in addition to waiting for the job to finish. In this scenario, there is most often a "idling cluster" present which is used when a request hits the server (like in your example code).

So, you definitely want to use cluster.execute as you want to wait for the results of the task function. The reason, you do not see any errors is (as quoted above) that the errors of the cluster.queue function are emitted via a taskerror event. The cluster.execute errors are directly thrown (Promise is rejected). Most likely, in both cases your jobs fail, but it is only visible for the cluster.execute

来源：https://stackoverflow.com/questions/57361073/puppeteer-cluster-queue-instead-of-execute

标签

javascript

node.js

Puppeteer

puppeteer-cluster