Selenium (Python) - waiting for a download process to complete using Chrome web driver

后端 未结 9 1165

I\'m using selenium and python via chromewebdriver (windows) in order to automate a task of downloading large amount of files from different pages. My code works, but the so

相关标签:
9条回答
  • 2020-12-04 18:53
    import os
    from selenium import webdriver
    from selenium.webdriver.support.wait import WebDriverWait
    
    class MySeleniumTests(unittest.TestCase):
    
        selenium = None
    
        @classmethod
        def setUpClass(cls):
            cls.selenium = webdriver.Firefox(...)
    
        ...
    
        def test_download(self):
            os.chdir(self.download_path) # default download directory
    
            # click the button
            self.selenium.get(...)
            self.selenium.find_element_by_xpath(...).click()
    
            # waiting server for finishing inner task
            def download_begin(driver):
                if len(os.listdir()) == 0:
                    time.sleep(0.5)
                    return False
                else:
                    return True
            WebDriverWait(self.selenium, 120).until(download_begin) # the max wating time is 120s
    
            # waiting server for finishing sending.
            # if size of directory is changing,wait
            def download_complete(driver):
                sum_before=-1
                sum_after=sum([os.stat(file).st_size for file in os.listdir()])
                while sum_before != sum_after:
                    time.sleep(0.2)
                    sum_before = sum_after
                    sum_after = sum([os.stat(file).st_size for file in os.listdir()])
                return True
            WebDriverWait(self.selenium, 120).until(download_complete)  # the max wating time is 120s
    

    You must do these thing

    1. Wait for server to finish inner business( for example, query from database).
    2. Wait for server to finish sending the files.

    (my English is not very well)

    0 讨论(0)
  • 2020-12-04 18:55

    You can get the status of each download by visiting chrome://downloads/ with the driver.

    To wait for all the downloads to finish and to list all the paths:

    def every_downloads_chrome(driver):
        if not driver.current_url.startswith("chrome://downloads"):
            driver.get("chrome://downloads/")
        return driver.execute_script("""
            var items = document.querySelector('downloads-manager')
                .shadowRoot.getElementById('downloadsList').items;
            if (items.every(e => e.state === "COMPLETE"))
                return items.map(e => e.fileUrl || e.file_url);
            """)
    
    
    # waits for all the files to be completed and returns the paths
    paths = WebDriverWait(driver, 120, 1).until(every_downloads_chrome)
    print(paths)
    

    Was updated to support changes till version 81.

    0 讨论(0)
  • 2020-12-04 18:57

    To obtain the return of more than one item, I had to change the answer of @thdox by the code below:

    def every_downloads_chrome(driver):
        if not driver.current_url.startswith("chrome://downloads"):
            driver.get("chrome://downloads/")
        return driver.execute_script("""
            var elements = document.querySelector('downloads-manager')
            .shadowRoot.querySelector('#downloadsList')
            .items
            if (elements.every(e => e.state === 'COMPLETE'))
            return elements.map(e => e.filePath || e.file_path || e.fileUrl || e.file_url);
            """)
    
    0 讨论(0)
  • 2020-12-04 19:03

    With Chrome 80, I had to change the answer from @florent-b by the code below:

    def every_downloads_chrome(driver):
        if not driver.current_url.startswith("chrome://downloads"):
            driver.get("chrome://downloads/")
        return driver.execute_script("""
            return document.querySelector('downloads-manager')
            .shadowRoot.querySelector('#downloadsList')
            .items.filter(e => e.state === 'COMPLETE')
            .map(e => e.filePath || e.file_path || e.fileUrl || e.file_url);
            """)
    

    I believe this is retro-compatible, I mean this shall be working with older versions of Chrome.

    0 讨论(0)
  • 2020-12-04 19:07

    I have had the same problem and found a solution. You can check weither or not a .crdownload is in your download folder. If there are 0 instances of a file with .crdownload extension in the download folder then all your downloads are completed. This only works for chrome and chromium i think.

    def downloads_done():
        while True:
            for filename in os.listdir("/downloads"):
                if ".crdownload" in i:
                    time.sleep(0.5)
                    downloads_done()
    

    Whenever you call downloads_done() it will loop itself untill all downloads are completed. If you are downloading massive files like 80 gigabytes then i don't recommend this because then the function can reach maximum recursion depth.

    2020 edit:

    def wait_for_downloads():
        print("Waiting for downloads", end="")
        while any([filename.endswith(".crdownload") for filename in 
                   os.listdir("/downloads")]):
            time.sleep(2)
            print(".", end="")
        print("done!")
    

    The "end" keyword argument in print() usually holds a newline but we replace it. While there are no filenames in the /downloads folder that end with .crdownload sleep for 2 seconds and print one dot without newline to console

    I don't really recommend using selenium anymore after finding out about requests but if it's a very heavily guarded site with cloudflare and captchas etc then you might have to resort to selenium.

    0 讨论(0)
  • 2020-12-04 19:08

    There are issues with opening chrome://downloads/ when running Chrome in headless mode.

    The following function uses a composite approach that works whether the mode is headless or not, choosing the better approach available in each mode.

    It assumes that the caller clears all files downloaded at file_download_path after each call to this function.

    import os
    import logging
    from selenium.webdriver.support.ui import WebDriverWait
    
    def wait_for_downloads(driver, file_download_path, headless=False, num_files=1):
        max_delay = 60
        interval_delay = 0.5
        if headless:
            total_delay = 0
            done = False
            while not done and total_delay < max_delay:
                files = os.listdir(file_download_path)
                # Remove system files if present: Mac adds the .DS_Store file
                if '.DS_Store' in files:
                    files.remove('.DS_Store')
                if len(files) == num_files and not [f for f in files if f.endswith('.crdownload')]:
                    done = True
                else:
                    total_delay += interval_delay
                    time.sleep(interval_delay)
            if not done:
                logging.error("File(s) couldn't be downloaded")
        else:
            def all_downloads_completed(driver, num_files):
                return driver.execute_script("""
                    var items = document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList').items;
                    var i;
                    var done = false;
                    var count = 0;
                    for (i = 0; i < items.length; i++) {
                        if (items[i].state === 'COMPLETE') {count++;}
                    }
                    if (count === %d) {done = true;}
                    return done;
                    """ % (num_files))
    
            driver.execute_script("window.open();")
            driver.switch_to_window(driver.window_handles[1])
            driver.get('chrome://downloads/')
            # Wait for downloads to complete
            WebDriverWait(driver, max_delay, interval_delay).until(lambda d: all_downloads_completed(d, num_files))
            # Clear all downloads from chrome://downloads/
            driver.execute_script("""
                document.querySelector('downloads-manager').shadowRoot
                .querySelector('#toolbar').shadowRoot
                .querySelector('#moreActionsMenu')
                .querySelector('button.clear-all').click()
                """)
            driver.close()
            driver.switch_to_window(driver.window_handles[0])
    
    0 讨论(0)
提交回复
热议问题