问题:

I want to capture the traffic to sites I'm browsing to using Selenium with python and since the traffic will be https using a proxy won't get me far.

My idea was to run phantomJS with selenium to and use phantomJS to execute a script (not on the page using webdriver.execute_script(), but on phantomJS itself). I was thinking of the netlog.js script (from here https://github.com/ariya/phantomjs/blob/master/examples/netlog.js).

Since it works like this in the command line

phantomjs --cookies-file=/tmp/foo netlog.js https://google.com

there must be a similar way to do this with selenium?

Thanks in advance

Update:

Solved it with browsermob-proxy.

pip3 install browsermob-proxy

Python3 code

from selenium import webdriver from browsermobproxy import Server  server = Server(<path to browsermob-proxy>) server.start() proxy = server.create_proxy({'captureHeaders': True, 'captureContent': True, 'captureBinaryContent': True})  service_args = ["--proxy=%s" % proxy.proxy, '--ignore-ssl-errors=yes'] driver = webdriver.PhantomJS(service_args=service_args)  proxy.new_har() driver.get('https://google.com') print(proxy.har)  # this is the archive # for example: all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]

回答1:

I am using a proxy for this

from selenium import webdriver from browsermobproxy import Server  server = Server(environment.b_mob_proxy_path) server.start() proxy = server.create_proxy() service_args = ["--proxy-server=%s" % proxy.proxy] driver = webdriver.PhantomJS(service_args=service_args)  proxy.new_har() driver.get('url_to_open') print proxy.har  # this is the archive # for example: all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]

the 'har' (http archive format) has a lot of other information about the requests and responses, it's very useful to me

installing on Linux:

pip install browsermob-proxy

回答2:

I use a solution without a proxy server for this. I modified the selenium source code according to the link bellow in order to add the executePhantomJS function.

https://github.com/SeleniumHQ/selenium/pull/2331/files

Then I execute the following script after getting the phantomJS driver:

from selenium.webdriver import PhantomJS  driver = PhantomJS()  script = """     var page = this;     page.onResourceRequested = function (req) {         console.log('requested: ' + JSON.stringify(req, undefined, 4));     };     page.onResourceReceived = function (res) {         console.log('received: ' + JSON.stringify(res, undefined, 4));     }; """  driver.execute_phantomjs(script) driver.get("http://ariya.github.com/js/random/") driver.quit()

Then all the requests are logged in the console (usually the ghostdriver.log file)

转载请标明出处:Network capturing with Selenium/PhantomJS

文章来源: Network capturing with Selenium/PhantomJS

标签

phantomjs

selenium