Downloading a PDF using Selenium, Chrome and Python

问题

I tried following previous posts on this topic such as these (post 1, post 2), but I'm still stuck.

My script has to log into a site using a set of credentials, then navigate through some drop down menus to select a report. Once the report is selected, a new window pops up where parameters must be adjusted to generate the report. Once the parameters are set, the same pop up window refreshes with the generated report in PDF format and is displayed using Chrome's built in PDF viewer. I was under the impression that passing certain options to the webdriver would disable this PDF viewer and simply download the file, but the PDF viewer is still being displayed and nothing is automatically downloaded. Surely I'm missing something or I wrote something incorrectly. Here's the jist of my code:

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_experimental_option('prefs',  {
    "download.default_directory": download_dir,
    "download.prompt_for_download": False,
    "download.directory_upgrade": True,
    "plugins.plugins_disabled": ["Chrome PDF Viewer"]
    }
)

browser = webdriver.Chrome(options = chrome_options)

driver = webdriver.Chrome()
driver.get(url)

#In between here are a bunch of steps here that navigates through drop down menus

#This step may not be necessary, but I figured I'd include it to address when the pop up window refreshes and displays the report in PDF format through Chrome's PDF viewer
driver.switch_to.window(driver.window_handles[1])

So, at this point, Chrome still displays the PDF viewer even though I disabled it earlier. Nothing is downloaded, so I'm wondering if I need to provide another line of code or perhaps something else.

Using Selenium version 3.141.0, Python 3.6.4, Chrome webdriver 2.45 on Windows 10.

回答1:

You need to replace "plugins.plugins_disabled": ["Chrome PDF Viewer"]

With:

"plugins.always_open_pdf_externally": True

Hope this helps you!

回答2:

I had a similar problem, which I have solved with the firefox driver in Java. Here is my code:

ffprofile.setPreference("browser.helperApps.neverAsk.saveToDisk","application/pdf");
ffprofile.setPreference("browser.download.folderList", 2);
ffprofile.setPreference("browser.download.manager.showWhenStarting", false);
ffprofile.setPreference("browser.download.dir", "path/to/directory");
ffprofile.setPreference("plugin.scan.plid.all",false);
ffprofile.setPreference("plugin.scan.Acrobat","99.0");
ffprofile.setPreference("pdfjs.disabled",true);

Maybe for you it is an option to use Firefox and the Java->Python translation should be simple.

来源：https://stackoverflow.com/questions/53998690/downloading-a-pdf-using-selenium-chrome-and-python

标签

python

selenium

selenium-chromedriver