How can I control PhantomJS to skip download some kind of resource?

后端 未结 4 2084
Happy的楠姐
Happy的楠姐 2020-12-13 03:59

phantomjs has config loadImage,

but I want more,

how can I control phantomjs to skip download some kind of resource,

such as css etc...

===

相关标签:
4条回答
  • 2020-12-13 04:06

    UPDATED, Working!

    Since PhantomJS 1.9, the existing answer didn't work. You must use this code:

    var webPage = require('webpage');
    var page = webPage.create();
    
    page.onResourceRequested = function(requestData, networkRequest) {
      var match = requestData.url.match(/wordfamily.js/g);
      if (match != null) {
        console.log('Request (#' + requestData.id + '): ' + JSON.stringify(requestData));
        networkRequest.cancel(); // or .abort() 
      }
    };
    

    If you use abort() instead of cancel(), it will trigger onResourceError.

    You can look at the PhantomJS docs

    0 讨论(0)
  • 2020-12-13 04:19

    Use page.onResourceRequested, as in example loadurlwithoutcss.js:

    page.onResourceRequested = function(requestData, request) {
        if ((/http:\/\/.+?\.css/gi).test(requestData['url']) || 
                requestData.headers['Content-Type'] == 'text/css') {
            console.log('The url of the request is matching. Aborting: ' + requestData['url']);
            request.abort();
        }
    };
    
    0 讨论(0)
  • 2020-12-13 04:24

    No way for now (phantomjs 1.7), it does NOT support that.

    But a nasty solution is using a http proxy, so you can screen out some request that you don't need

    0 讨论(0)
  • 2020-12-13 04:27

    So finally you can try this http://github.com/eugenehp/node-crawler

    otherwise you can still try the below approach with PhantomJS

    The easy way, is to load page -> parse page -> exclude unwanted resource -> load it into PhatomJS.

    Another way is just simply block the hosts in the firewall.

    Optionally you can use a proxy to block certain URL addresses and queries to them.

    And additional one, load the page, and then remove the unwanted resources, but I think its not the right approach here.

    0 讨论(0)
提交回复
热议问题