Selenium not going to next page in scraper

江枫思渺然 提交于 2021-02-20 03:51:45

问题


I'm writing my first real scraper and although in general it's been going well, I've hit a wall using Selenium. I can't get it to go to the next page.

Below is the head of my code. The output below this is just printing out data in terminal for now and that's all working fine. It just stops scraping at the end of page 1 and shows me my terminal prompt. It never starts on page 2. I would be so grateful if anyone could make a suggestion. I've tried selecting the button at the bottom of the page I'm trying to scrape using both the relative and full Xpath (you're seeing the full one here) but neither work. I'm trying to click the right-arrow button.

I built in my own error message to indicate whether the driver successfully found the element by Xpath or not. The error message fires when I execute my code, so I guess it's not finding the element. I just can't understand why not.

# Importing libraries
import requests
import csv
import re
from urllib.request import urlopen
from bs4 import BeautifulSoup

# Import selenium 
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, WebDriverException
import time

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome("/path/to/driver", options=options)
# Yes, I do have the actual path to my driver in the original code

driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
time.sleep(5)
while True:
    try:
        driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div/form/div[3]/div/div/ul[1]/li[4]/a').click()
    except (TimeoutException, WebDriverException) as e:
        print("A timeout or webdriver exception occurred.")
        break
driver.quit()

回答1:


What you can do is to set up Selenium expected conditions (visibility_of_element_located, element_to_be_clickable) and use a relative XPath to select the next page element. All of this in a loop (its range is the number of pages you have to deal with).

XPath for the next page link :

//div[@class='pagination ctm-pagination']/ul[1]/li[last()-1]/a

Code could look like :

## imports

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")

## count the number of pages you have

els = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='pagination ctm-pagination']/ul[1]/li[last()]/a"))).get_attribute("data-current-page")

## loop. at the end of the loop, click on the following page

for i in range(int(els)):
    ***scrape what you want***
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='pagination ctm-pagination']/ul[1]/li[last()-1]/a"))).click()



回答2:


You were pretty close with while True and try-catch{} logic. To go to the next page using Selenium and python you have to induce WebDriverWait for element_to_be_clickable() and you can use either of the following Locator Strategies:

  • Code Block:

    driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
    while True:
        try:
            WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(@class, 'state-active')]//following::li[1]/a[@href]"))).click()
            print("Clicked for next page")
            WebDriverWait(driver, 10).until(EC.staleness_of(driver.find_element_by_xpath("//a[contains(@class, 'state-active')]//following::li[1]/a[@href]")))
        except (TimeoutException):
            print("No more pages")
            break
    driver.quit()
    
  • Console Output:

    Clicked for next page
    No more pages
    


来源:https://stackoverflow.com/questions/63175102/selenium-not-going-to-next-page-in-scraper

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!