Python Download PDF Embedded in a Page

后端 未结 1 1400
轮回少年
轮回少年 2021-01-01 07:43

I have this link:

  • http://www.equibase.com/premium/chartEmb.cfm?track=ALB&raceDate=06/17/2002&cy=USA&rn=1

I want to download the emb

相关标签:
1条回答
  • 2021-01-01 07:56

    Using Selenium with a specific ChromeProfile you can download embedded pdfs using the following code:

    Code:

    def download_pdf(lnk):
    
        from selenium import webdriver
        from time import sleep
    
        options = webdriver.ChromeOptions()
    
        download_folder = "C:\\"    
    
        profile = {"plugins.plugins_list": [{"enabled": False,
                                             "name": "Chrome PDF Viewer"}],
                   "download.default_directory": download_folder,
                   "download.extensions_to_open": ""}
    
        options.add_experimental_option("prefs", profile)
    
        print("Downloading file from link: {}".format(lnk))
    
        driver = webdriver.Chrome(chrome_options = options)
        driver.get(lnk)
    
        filename = lnk.split("/")[4].split(".cfm")[0]
        print("File: {}".format(filename))
    
        print("Status: Download Complete.")
        print("Folder: {}".format(download_folder))
    
        driver.close()
    

    And when I call this function:

    download_pdf("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=1&BorP=P&TID=ALB&CTRY=USA&DT=06/17/2002&DAY=D&STYLE=EQB")
    

    Thats the output:

    >>> Downloading file from link: http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=1&BorP=P&TID=ALB&CTRY=USA&DT=06/17/2002&DAY=D&STYLE=EQB
    >>> File: eqbPDFChartPlus
    >>> Status: Download Complete.
    >>> Folder: C:\
    


    Take a look at the specific profile:

    profile = {"plugins.plugins_list": [{"enabled": False,
                                         "name": "Chrome PDF Viewer"}],
               "download.default_directory": download_folder,
               "download.extensions_to_open": ""}
    

    It disables the Chrome PDF Viewer plugin (that embedds the pdf at the webpage), set the default download folder to the folder defined at download_folder variable and sets that Chrome isn't allowed to open any extensions automatically.

    After that, when you open the so called "Internal link" your webdriver will automatically download the .pdf file to the download_folder.

    0 讨论(0)
提交回复
热议问题