问题
I am trying to automatically save a PDF file created with pdftohtmlEX (https://github.com/coolwanglu/pdf2htmlEX) using the selenium (chrome) webdriver.
It almost works except captions of figures and sometimes even part of the figures are missing.
Manually saved:
Automatically saved using selenium & chrome webdriver:
Here is my code (you need the chromium webdriver (http://chromedriver.chromium.org/downloads) in the same folder as this script):
import json
from selenium import webdriver
# print settings: save as pdf, 'letter' formatting
appState = """{
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local"
}
],
"mediaSize": {
"height_microns": 279400,
"name": "NA_LETTER",
"width_microns": 215900,
"custom_display_name": "Letter"
},
"selectedDestinationId": "Save as PDF",
"version": 2
}"""
appState = json.loads(appState)
profile = {"printing.print_preview_sticky_settings.appState": json.dumps(appState)}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
# Enable automatically pressing the print button in print preview
# https://peter.sh/experiments/chromium-command-line-switches/
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome('./chromedriver', options=chrome_options)
driver.get('http://www.deeplearningbook.org/contents/intro.html')
driver.execute_script('window.print();')
driver.quit()
Sometimes when I manually print this happens, too. But if I then change any of the printing options, the preview reloads and the image captions are there again and stay there no matter what options I further enable/disable.
What I tried so far:
- different Chrome webdriver versions (71, 72, 73) from this site: http://chromedriver.chromium.org/downloads
- enable background graphics by adding '"isCssBackgroundEnabled": true' to the appState
回答1:
So, through fiddeling around, I came by the solution by accident. I don't really understand why, but enabling the 'PrintBrowser mode' ("Enables PrintBrowser mode, in which everything renders as though printed.") solves the issue. This may or may have to do with CSS loading properly.
I just need to add chrome_options.add_argument('--enable-print-browser') and all elements are there!
来源:https://stackoverflow.com/questions/54943980/missing-elements-when-using-selenium-chrome-driver-to-automatically-save-as-pdf