问题
This code scrapes the HTML table from https://www.asx.com.au/asx/statistics/prevBusDayAnns.do and downloads PDF files for specific ASX Codes and Headlines. When the for loop iterates over the ASX Codes found in 'data', it iterates over the first ASX Code five times which creates five duplicate of the same PDF. For example, in the code below there would be five copies of TWD. The amount of times the for loop iterates over the first ASX code is equal to the amount of ASX Codes in 'data'. For example, if there were ten codes, I would end up with ten copies of PDF files for TWD. This only happens to the first ASX Code, everything else is fine. Any reason why this is happening?
Relevant code:
driver.get("https://www.asx.com.au/asx/statistics/prevBusDayAnns.do")
data = ['TWD', 'GEM', 'AT1','TKF','GDF']
asxcodes = []
for d in data:
try:
asxcode = driver.find_element_by_xpath("//table//tr//td[text()='{}']/following-sibling::td[3]/a[contains(.,'{}')]".format(d,"Becoming a substantial holder")).get_attribute("href")
asxcodes.append(asxcode)
except:
pass
Entire code:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time
chromeOptions = webdriver.ChromeOptions()
prefs = {"plugins.always_open_pdf_externally": True,"download.default_directory" : r"C:\Users\Harrison Pollock\Desktop\The Smarts\Becoming a Substantial Holder"}
chromeOptions.add_experimental_option("prefs",prefs)
chromedriver = r"C:\Users\Harrison Pollock\Downloads\Python\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(executable_path=r"C:\Users\Harrison Pollock\Downloads\Python\chromedriver_win32\chromedriver.exe",chrome_options=chromeOptions)
driver.get("https://www.asx.com.au/asx/statistics/prevBusDayAnns.do")
data = ['TWD', 'GEM', 'AT1','TKF','GDF'
asxcodes = []
for d in data:
try:
asxcode = driver.find_element_by_xpath("//table//tr//td[text()='{}']/following-sibling::td[3]/a[contains(.,'{}')]".format(d,"Becoming a substantial holder")).get_attribute("href")
asxcodes.append(asxcode)
except:
pass
for asxcode in asxcodes:
driver.get(asxcode)
WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, "//input[@value='Agree and proceed']"))).click()
time.sleep(10)
回答1:
Instead of getting all href value and then iterate could you try something like click on each link and then click for download.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time
chromeOptions = webdriver.ChromeOptions()
prefs = {"plugins.always_open_pdf_externally": True,"download.default_directory" : r"C:\Users\Harrison Pollock\Desktop\The Smarts\Becoming a Substantial Holder"}
chromeOptions.add_experimental_option("prefs",prefs)
chromedriver = r"C:\Users\Harrison Pollock\Downloads\Python\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(executable_path=r"C:\Users\Harrison Pollock\Downloads\Python\chromedriver_win32\chromedriver.exe",chrome_options=chromeOptions)
driver.get("https://www.asx.com.au/asx/statistics/prevBusDayAnns.do")
data = ['TWD', 'GEM', 'AT1','TKF','GDF']
asxcodes = []
for d in data:
try:
driver.find_element_by_xpath("//table//tr//td[text()='{}']/following-sibling::td[3]/a[contains(.,'{}')]".format(d,"Becoming a substantial holder")).click()
WebDriverWait(driver,5).until(EC.number_of_windows_to_be(2))
driver.switch_to.window(driver.window_handles[-1])
WebDriverWait(driver,5).until(EC.element_to_be_clickable((By.XPATH, "//input[@value='Agree and proceed']"))).click()
time.sleep(10)
driver.close()
driver.switch_to.window(driver.window_handles[-1])
except:
driver.switch_to.window(driver.window_handles[-1])
continue
Hope this logic helps.
来源:https://stackoverflow.com/questions/61320949/for-loop-keeps-iterating-over-first-piece-of-data-in-list