How to handle lazy-loaded images in selenium?

爱⌒轻易说出口 提交于 2021-02-05 10:46:11

问题


Before marking as duplicate, please consider that I have already looked through many related stack overflow posts, as well as websites and articles. I have not found a solution yet.

This question is a follow up to this question here Selenium Webdriver not finding XPATH despite seemingly identical strings. I determined the problem did not in fact come from the xpath method by updating the code to work in a more elegant manner:

for item in feed:
    img_div = item.find_element_by_class_name('listing-cover-photo ')
    img = WebDriverWait(img_div, 10).until(
            EC.visibility_of_element_located((By.TAG_NAME, 'img')))

This works for the first 5ish elements. But after that it times out, by getting the inner html of the img_div and printing it, I found that for elements that time out, instead of the image I want there is a div with class "lazyload-placeholder". This led me to scraping lazy-loaded elements, but there was no answer that I could find. As you can see, I am using a WebDriverWait to try and give it time to load, but I also tried a site-wide wait call, as well as a time.sleep call. Waiting does not seem to fix it. I am looking for the easiest way to handle these lazy-loaded images, preferably in Selenium, but if there are other libraries or products I can use in tandem with the Selenium code I already have, that would be great. Any help is appreciated.


回答1:


Your images will only load when they're scrolled into view. It's such a common requirement that the Selenium Python docs have it in their FAQ. Adapting from this answer, the below script will scroll down the page before scraping the images.

    driver.get("https://www.grailed.com/categories/footwear")

    SCROLL_PAUSE_TIME = 0.5
    i = 0
    last_height = driver.execute_script("return document.body.scrollHeight")
    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(SCROLL_PAUSE_TIME)
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height
        i += 1
        if i == 5:
            break

    driver.implicitly_wait(10)
    shoe_images = driver.find_elements(By.CSS_SELECTOR, 'div.listing-cover-photo img')

    print(len(shoe_images))

In the interest of not scrolling through shoes (seemingly) forever, I have added in a break after 5 iterations, however, you're free to remove the i variable and it will scroll down for as long as it can.

The implicit wait is there to allow catchup for any remaining images that are still loading in.

A test run yielded 82 images, I confirmed that it had scraped all on the page by using Chrome's DevTools selector which highlighted 82. You'll see a different number based on how many images you allow to load.



来源:https://stackoverflow.com/questions/62600288/how-to-handle-lazy-loaded-images-in-selenium

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!