Prevent CSS/other resource download in PhantomJS/Selenium driven by Python

懵懂的女人 提交于 2019-11-28 06:31:45
Will McChesney

A bold young soul by the name of “watsonmw” recently added functionality to Ghostdriver (which Phantom.js uses to interface with Selenium) that allows access to Phantom.js API calls which require a page object, like the onResourceRequested one you cited.

For a solution at all costs, consider building from source (which developers note “takes roughly 30 minutes ... with 4 parallel compile jobs on a modern machine”) and integrating his patch, linked above.

Then this (untested) Python code should work as a proof of concept:

from selenium import webdriver
driver = webdriver.PhantomJS('phantomjs')

# hack while the python interface lags
driver.command_executor._commands['executePhantomScript'] = ('POST', '/session/$sessionId/phantom/execute')

driver.execute('executePhantomScript', {'script': '''
page.onResourceRequested = function(requestData, request) {
    // ...
}
''', 'args': []})

Until then, you’ll just get a Can't find variable: page exception.

Good luck! There are a lot of great alternatives, like working in a Javascript environment, driving Gecko, proxies, etc.

Will's answer got me on track. (Thanks Will!)

Current PhantomJS (1.9.8) includes Ghostdriver 1.1.0 which already contains watsonmw's patch.

You need to download the latest PhantomJS, perform the following (sudo may be required):

ln -s path/to/bin/phantomjs  /usr/local/share/phantomjs
ln -s path/to/bin/phantomjs  /usr/local/bin/phantomjs
ln -s path/to/bin/phantomjs  /usr/bin/phantomjs

And then try this:

from selenium import webdriver
driver = webdriver.PhantomJS('phantomjs')

# hack while the python interface lags
driver.command_executor._commands['executePhantomScript'] = ('POST', '/session/$sessionId/phantom/execute')

driver.execute('executePhantomScript', {'script': '''
    var page = this; // won't work otherwise
    page.onResourceRequested = function(requestData, request) {
    // ...
}
''', 'args': []})

Proposed solutions didn't work for me, but this one works (it uses driver.execute_script):

driver.command_executor._commands['executePhantomScript'] = ('POST', '/session/$sessionId/phantom/execute')

driver.execute_script('''
    this.onResourceRequested = function(request, net) {
        console.log('REQUEST ' + request.url);
    };
''')
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!