I want to capture the traffic to sites I'm browsing to using Selenium with python and since the traffic will be https using a proxy won't get me far.
My idea was to run phantomJS with selenium to and use phantomJS to execute a script (not on the page using webdriver.execute_script(), but on phantomJS itself). I was thinking of the netlog.js script (from here https://github.com/ariya/phantomjs/blob/master/examples/netlog.js).
there must be a similar way to do this with selenium?
Thanks in advance
Update:
Solved it with browsermob-proxy.
pip3 install browsermob-proxy
Python3 code
from selenium import webdriver from browsermobproxy importServer server =Server(<path to browsermob-proxy>) server.start() proxy = server.create_proxy({'captureHeaders':True,'captureContent':True,'captureBinaryContent':True}) service_args =["--proxy=%s"% proxy.proxy,'--ignore-ssl-errors=yes'] driver = webdriver.PhantomJS(service_args=service_args) proxy.new_har() driver.get('https://google.com')print(proxy.har)# this is the archive# for example: all_requests =[entry['request']['url']for entry in proxy.har['log']['entries']]
回答1:
I am using a proxy for this
from selenium import webdriver from browsermobproxy importServer server =Server(environment.b_mob_proxy_path) server.start() proxy = server.create_proxy() service_args =["--proxy-server=%s"% proxy.proxy] driver = webdriver.PhantomJS(service_args=service_args) proxy.new_har() driver.get('url_to_open')print proxy.har # this is the archive# for example: all_requests =[entry['request']['url']for entry in proxy.har['log']['entries']]
the 'har' (http archive format) has a lot of other information about the requests and responses, it's very useful to me
installing on Linux:
pip install browsermob-proxy
回答2:
I use a solution without a proxy server for this. I modified the selenium source code according to the link bellow in order to add the executePhantomJS function.