Error message when using PhantomJS, breaks at random intervals

时光总嘲笑我的痴心妄想 提交于 2019-12-23 04:39:51

问题


The error message that I keep getting is the following:

assert.js:92
  throw new assert.AssertionError({
        ^
AssertionError: abnormal phantomjs exit code: -1073741819
    at Console.assert (console.js:102:23)
    at ChildProcess.<anonymous> (C:\Users\file_path...\node_modules\phantom\phantom.js:132:28)
    at ChildProcess.emit (events.js:98:17)
    at Process.ChildProcess._handle.onexit (child_process.js:810:12)
Program node app.js exited with code 8

The break happens at random, sometimes after inserting over a thousand rows into postgreSQL, sometimes after just a handful of rows.

I'm fairly sure that the error is occuring in the following function inside of my code, based on a lot of different console.logs that I have put throughout the code. Also, I think that assert.js:92 is from Chai:

function getNetworkTraffic(networkUrl,senderEmail) {
    phantom.create(function (ph) {
        ph.createPage(function (page) {
            page.set("onResourceRequested", function (req) {
                referrerValue = "";
                referrerName = "";
                linkRedirectUrl="";
                console.log('Fetching network traffic...')
                for (i in req.headers) {
                    allReferrals = req.headers[i]
                    if (allReferrals.name == "Referer"){
                        referrerName = allReferrals.name
                        referrerValue = allReferrals.value
                    }
                }
                linkUrl = req.url
                if(req.redirectURL){
                    linkRedirectUrl = redirectURL             
                }
                singleReq = {"referrerName":referrerName,"referrerValue":referrerValue,"requestUrl":linkUrl,"redirectURL":linkRedirectUrl, "parent_url":networkUrl, "source": "email", "senderEmail":senderEmail}
                // insertNetworkTrafficPg(singleReq)
            });
            page.set("onResourceReceived", function (res) {
                linkRedirectUrl = "";
                responseUrl = res.url
                if(res.redirectURL){
                    linkRedirectUrl = res.redirectURL      
                }
                singleRes = {"responseUrl":responseUrl,"redirectURL":linkRedirectUrl,"parent_url":networkUrl,"source": "email", "senderEmail":senderEmail}
                // insertNetworkTrafficPg(singleRes)
            });
            try{
                page.open(networkUrl, function (status) {
                    if (status !== 'success') {
                        console.log('FAIL to load the address');
                    }
                    console.log('Opening web address...');
                    ph.exit();
                });
            } catch(err) {
                console.log(err)
            }
        });
    }, {
        dnodeOpts: {
            weak: false
        }
    });    
}

回答1:


It's possible to use phantomjs with node, but keep in mind that this is a bridge. Also, it's really not intended for a lot of scraping, so perhaps you are opening too many threads and its a stackoverflow, but breaking with another message. You might want to try using something like python-shell to run a script for python to do your scraping.



来源:https://stackoverflow.com/questions/26468608/error-message-when-using-phantomjs-breaks-at-random-intervals

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!