问题
I'm working on scraping all of the Air Jordan Data off of grailed.com (https://www.grailed.com/designers/jordan-brand/hi-top-sneakers). I am storing the size, model, url, and image url in an object. I currently have a program that scrolls through the entire feed and fetches all of this. Everything works except finding the image url. I have tried many things and the issue seems to be that for some elements in the feed Selenium doesn't detect the div or url containing the image. I have gone through and manually checked these cases, and they do indeed have images in the same structure. My current code looks like this:
feed = driver.find_elements_by_class_name('feed-item')
for item in feed:
# Find the div containing the image
img_div = item.find_element_by_class_name("listing-cover-photo ")
img = img_div.find_element_by_tag_name('img')
I have tried a couple other things as well. The issue is that sometimes it says it can't find elements with the "listing-cover-photo", even though I can check the items for which this is the case and I can still find the elements. How should I debug/fix this, or can anyone help?
回答1:
To get the image src value you need to scroll the page first.
Induce WebDriverWait
() and wait for visibility_of_all_elements_located
() and following css selector.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://www.grailed.com/designers/jordan-brand/hi-top-sneakers")
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
images=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".feed-item .listing-cover-photo>img")))
for image in images:
print(image.get_attribute("src"))
Output:
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/AYHhwtgRxSkdTtZ2fMoi
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/yPm24xb1QeyNJvmlKriU
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/0PmW3y2SOmvy9iDHr44q
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/0huJrabvQyei6H8xVZWS
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/23Bx5rr8SR2Pv53lO9Hb
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/dsdGACdNRse93DpTN9Sl
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/KQ3z8G9DQFWTjNkO6Obp
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/mF8nkq8LTzi2fTuCfAAS
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/X9tLf5KzSreO1QW2QX4w
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/gNnXP7ToTnl9hjSEiRrz
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/LMFdqBosRI2NLDCkR9Ze
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/htBeZs05SNyflHqpd7pC
来源:https://stackoverflow.com/questions/62540289/inconsistency-in-scraping-through-divs-in-selenium