Using Selenium with Python and PhantomJS to download file to filesystem

帅比萌擦擦* 提交于 2019-12-28 05:21:26

问题


I've been grappling with using PhantomJS/Selenium/python-selenium to download a file to the filesystem. I'm able to easily navigate through the DOM and click, hover etc. Downloading a file is, however, proving to be quite troublesome. I've tried a headless approach with Firefox and pyvirtualdisplay but that wasn't working well either and was unbelievably slow. I know That CasperJS allows for file downloads. Does anyone know how to integrate CasperJS with Python or how to utilize PhantomJS to download files. Much appreciated.


回答1:


Despite this question is quite old, downloading files through PhantomJS is still a problem. But we can use PhantomJS to get download link and fetch all needed cookies such as csrf tokens and so on. And then we can use requests to download it actually:

import requests
from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get('page_with_download_link')
download_link = driver.find_element_by_id('download_link')
session = requests.Session()
cookies = driver.get_cookies()

for cookie in cookies: 
    session.cookies.set(cookie['name'], cookie['value'])
response = session.get(download_link)

And now in response.content actual file content should appear. We can next write it with open or do whatever we want.




回答2:


PhantomJS doesn't currently support file downloads. Relevant issues with workarounds:

  • File download
  • How to handle file save dialog box using Selenium webdriver and PhantomJS?

As far as I understand, you have at least 3 options:

  • switch to casperjs (and you should leave python here)
  • try with headless on xvfb
  • switch to normal non-headless browsers

Here are also some links that might help too:

  • Selenium Headless Automated Testing in Ubuntu
  • XWindows for Headless Selenium (with further links inside)
  • How to run browsers(chrome, IE and firefox) in headless mode?
  • Tutorial: How to use Headless Firefox for Scraping in Linux



回答3:


My use case required a form submission to retrieve the file. I was able to accomplish this using the driver's execute_async_script() function.

 js = '''
    var callback = arguments[0];
    var theForm = document.forms['theFormId'];
    data = new FormData();
    data.append('eventTarget', "''' + target + '''"); // this is the id of the file clicked
    data.append('otherFormField', theForm.otherFormField.value);

    var xhr = new XMLHttpRequest();
    xhr.open('POST', theForm.action, true);
'''

for cookie in driver.get_cookies():
    js += ' xhr.setRequestHeader("' + cookie['name'] + '", "' + cookie['value'] + '"); '

js += '''
    xhr.onload = function () {
        callback(this.responseText);
    };
    xhr.send(data);
'''

driver.set_script_timeout(30)
file = driver.execute_async_script(js)



回答4:


Is not posible in that way. You can use other alternatives to download files like wget o curl.

Use firefox to find the right request and selenium to get the values for that and finally use out of to the box to download the file

curlCall=" curl 'http://www_sitex_org/descarga.jsf' -H '...allCurlRequest....' > file.xml"
subprocess.call(curlCall, shell=True)


来源:https://stackoverflow.com/questions/25755713/using-selenium-with-python-and-phantomjs-to-download-file-to-filesystem

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!