Selenium Webdriver not finding XPATH despite seemingly identical strings

隐身守侯 提交于 2020-06-29 03:38:49

问题


This question is related to my previous two: Inducing WebDriverWait for specific elements and Inconsistency in scraping through <div>'s in Selenium.

I am scraping all of the Air Jordan sneakers off of https://www.grailed.com/. The feed is an infinitely scrolling list of sneakers and I am using Selenium webdriver to scrape the data. My problem is that the images for the shoes seem to take a while to load, so it throws a lot of errors. I have found the pattern in the xpath's of the images. The xpath to the first image is /html/body/div[3]/div[6]/div[3]/div[3]/div[2]/div[2]/div[1]/a/div[2]/img, and the second is /html/body/div[3]/div[6]/div[3]/div[3]/div[2]/div[2]/div[2]/a/div[2]/img etc. It follows this linear sequences where the second to last div index increases by one each time. To handle this I put the following in my loop (only relevant code is included).

    i = 1
    while len(sneakers) < sneaker_count:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # Get sneakers currently on page and add to sneakers list
    feed = driver.find_elements_by_class_name('feed-item')
    for item in feed:
        xpath = "/html/body/div[3]/div[6]/div[3]/div[3]/div[2]/div[2]/div[" + str(i) +   "]/a/div[2]/img"
        img = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, xpath)))
        i += 1
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

The issue is, after about the 5th pair of shoes, the wait statement times out, it seems that the xpath passed in after that pair of shoes is not recognized. I used FireFox Developer to check the xpath using the copy xpath feature, and it seems identical to the passed in xpath when I print it. I use ChromeDriver w/Selenium but I don't think that's relevant. Does anyone know why the xpath's stop being recognized even though they seem identical?

UPDATE: So using an Xpath checker add-on to Chrome, it detects xpaths for items 1-4, but often stops detecting them after 6. When I check the xpath (both on Chrome and FireFox Developer mode, the xpath still looks identical, but it doesn't detect them when I use the "CSS and Xpath checker" it still doesn't seem to come out. This is a huge mystery to me.


回答1:


I found the problem. The xpath was fine, but after the first 4-5 elements, the images are lazy-loaded. This means that a different solution must be reached in order to scrape these images. It's not that they take too long to load, it's that they just load placeholders in the HTML.



来源:https://stackoverflow.com/questions/62581403/selenium-webdriver-not-finding-xpath-despite-seemingly-identical-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!