ZombieJS: intermittently crashes when called repeatedly from a for loop

自作多情 提交于 2019-12-13 06:47:49

问题


I have a ZombieJS node server on Heroku scrapping data from the internet. The server code is called from a for loop on the client side. Each iteration of the loop makes a server call which makes a Zombie scrape. Sometimes, the server will crash with the error below. It only happens when there is more than one iteration of the for loop.

How can I make the code robust enough to handle multiple simultaneous client calls, each with a for loop.

Code:

var express = require('express');
var app = express();
var Browser = require('zombie');    // tried changing var to const; no difference
var assert = require('assert');

app.set('port', (process.env.PORT || 5000));

var printMessage = function() { console.log("Node app running on " + app.get('port')); };

var getAbc = function(response, input)
{
    var browser = new Browser(); 
    browser.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0'; 
    browser.runScripts = true;
    var url = "http://www.google.com/ncr"; 

    browser.visit(url, function() {
        browser.fill('q', input).pressButton('Google Search', function(){
            // parsing number of results from browser object

            response.writeHead(200, {'Content-Type': 'text/plain'});
            response.end(numberOfSearchResults); 
        });
    });
}

var handleXyz = function(request, response)
{
    getAbc(response, request.query.input); 
}

app.listen(app.get('port'), printMessage); 
app.post('/xyz', handleXyz); 

Error:

 assert.js:86
   throw new assert.AssertionError({
              ^
 No open window with an HTML document
     at Browser.field (/app/node_modules/zombie/lib/index.js:811:7)
     at Browser.fill (/app/node_modules/zombie/lib/index.js:903:24)
     at /app/cfv1.js:42:11
     at done (/app/node_modules/zombie/lib/eventloop.js:589:9)
     at timeout (/app/node_modules/zombie/lib/eventloop.js:594:33)
     at Timer.listOnTimeout (timers.js:119:15)

I have a similar project using HorsemanJS/PhantomJS which fails in a similar way (I'm stuck on that too!): NodeJS server can't handle multiple users


回答1:


In general, I think you should be careful or just avoid generating a lot of unsolicited requests to remote servers. Many sites will throttle you and/or start rejecting connections. With that said, I believe I found the source of the issue in this particular case.

I tested the code snippet and for this particular case, Google will reset the connection if you make too many requests. When the connection is reset, one of the variables ends up failing an assertion.

The error I get when the connection is reset:

  zombie TypeError: read ECONNRESET
    at zombie/lib/pipeline.js:89:15
    at tryCatcher (zombie/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (zombie/node_modules/bluebird/js/release/promise.js:497:31)
    at Promise._settlePromise (zombie/node_modules/bluebird/js/release/promise.js:555:18)
    at Promise._settlePromise0 (zombie/node_modules/bluebird/js/release/promise.js:600:10)
    at Promise._settlePromises (zombie/node_modules/bluebird/js/release/promise.js:679:18)
    at Async._drainQueue (zombie/node_modules/bluebird/js/release/async.js:125:16)
    at Async._drainQueues (zombie/node_modules/bluebird/js/release/async.js:135:10)
    at Immediate.Async.drainQueues [as _onImmediate] (zombie/node_modules/bluebird/js/release/async.js:16:14)
    at processImmediate [as _immediateCallback] (timers.js:383:17)

I get your original error further down, but the source of the problem is actually because of the above. When the above happens, it causes document.documentElement to be a false-y value and subsequently causes this assertion in zombie/lib/index.js in the field function to fail:

assert(this.document && this.document.documentElement, 'No open window with an HTML document');

I think the easiest solution is to handle the error on the client end and try to recover gracefully.




回答2:


I see you are making a new instance of the Browser object for each call. My guess is the previous "Browser" is still closing, or hasn't been handled by the garbage collector when the next call is trying to open another. Try moving the instantiation of the Browser to outside of getAbc()



来源:https://stackoverflow.com/questions/35563187/zombiejs-intermittently-crashes-when-called-repeatedly-from-a-for-loop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!