Nodejs: Async request with a list of URL

久未见 提交于 2019-11-27 09:53:49

The things you need to watch for are:

  1. Whether the target site has rate limiting and you may be blocked from access if you try to request too much too fast?

  2. How many simultaneous requests the target site can handle without degrading its performance?

  3. How much bandwidth your server has on its end of things?

  4. How many simultaneous requests your own server can have in flight and process without causing excess memory usage or a pegged CPU.

In general, the scheme for managing all this is to create a way to tune how many requests you launch. There are many different ways to control this by number of simultaneous requests, number of requests per second, amount of data used, etc...

The simplest way to start would be to just control how many simultaneous requests you make. That can be done like this:

function runRequests(arrayOfData, maxInFlight, fn) {
    return new Promise((resolve, reject) => {
        let index = 0;
        let inFlight = 0;

        function next() {
            while (inFlight < maxInFlight && index < arrayOfData.length) {
                ++inFlight;
                fn(arrayOfData[index++]).then(result => {
                    --inFlight;
                    next();
                }).catch(err => {
                    --inFlight;
                    console.log(err);
                    // purposely eat the error and let the rest of the processing continue
                    // if you want to stop further processing, you can call reject() here
                    next();
                });
            }
            if (inFlight === 0) {
                // all done
                resolve();
            }
        }
        next();
    });
}

And, then you would use that like this:

const rp = require('request-promise');

// run the whole urlList, no more than 10 at a time
runRequests(urlList, 10, function(url) {
    return rp(url).then(function(data) {
        // process fetched data here for one url
    }).catch(function(err) {
        console.log(url, err);
    });
}).then(function() {
    // all requests done here
});

This can be made as sophisticated as you want by adding a time element to it (no more than N requests per second) or even a bandwidth element to it.

I want one request is called after one request is completed.

That's a very slow way to do things. If you really want that, then you can just pass a 1 for the maxInFlight parameter to the above function, but typically, things would work a lot faster and not cause problems by allowing somewhere between 5 and 50 simultaneous requests. Only testing would tell you where the sweet spot is for your particular target sites and your particular server infrastructure and amount of processing you need to do on the results.

you can use set timeout function to process all request within loop. for that you must know maximum time to process a request.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!