问题

Good afternoon,

Somewhat new to Python and webscraping, so any help would be greatly appreciated! First:

The Code

from selenium import webdriver
import time 

chrome_path = r"/Users/ENTER/Desktop/chromedriver"

driver = webdriver.Chrome(chrome_path)

site_url = 'https://www.home-school.com/groups/'

driver.get(site_url)

# get state links from sidebar and store to list
area = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div""")
items = area.find_elements_by_tag_name('a')

# remove unneeded links
del items[:22]
del items[-1:]

# 
for links in items:
    # print(links.text)
    print(links.get_attribute("href"))
    # add link related logic here
    links.click()
    # you have to wait for the next element to display
    time.sleep(4)
    # assign html container with desired data to variable
    element = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[4]/div""")
    # Store container text in variable. We skip the first 5 lines of text as they 
    #  are unnecessary.
    orgdata = element.text.split("\n",5)[5]
    orgdata = orgdata.replace(' Edit Remove More', '').replace(' Edit Remove', '')
    # Write data to text file
    filepath = '/Users/ENTER/Documents/STEMBoard/Tiger Team/Lingo/' + links.text + '.txt'
    file_object = open(filepath, 'a')
    file_object.write(orgdata)

The Problem

I am using Selenium in an attempt to save the names and information of homeschool groups from http://home-school.com/groups/ to individual text files per state.

To do this, I have saved a list of links and would like to iterate through the list to click each link, perform tasks related to scraping the desired data, manipulating the text, and outputting to separate text files per state.

I am getting StaleElementReferenceException: stale element reference: element is not attached to the page document when attempting to performing the "for" Loop.

I believe it is giving the error when it gets to element = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div"""). As far as I can tell, this xpath does not change. I assumed I needed to make the webdriver wait for the page to load, hence time.sleep(4).

I'm sure this is a simple fix that will make sense when I see it, but at the moment I am stumped. Any help you all can offer would be awesome! Thank you!

回答1:

Try it

from selenium import webdriver
import time 

chrome_path = r"/Users/ENTER/Desktop/chromedriver"

driver = webdriver.Chrome(chrome_path)

site_url = 'https://www.home-school.com/groups/'

driver.get(site_url)

# get state links from sidebar and store to list
area = driver.find_element_by_xpath("/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div")
items = area.find_elements_by_tag_name('a')

# remove unneeded links
del items[:22]
del items[-1:]

text_list = [i.text for i in items]
items = [i.get_attribute("href") for i in items]

for i in range(len(items)):
    driver.get(items[i])
    # you have to wait for the next element to display
    time.sleep(2)
    # assign html container with desired data to variable
    element = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div""")
    # Store container text in variable. We skip the first 5 lines of text as they 
    #  are unnecessary.
    orgdata = element.text.split("\n",5)[5]
    orgdata = orgdata.replace(' Edit Remove More', '').replace(' Edit Remove', '')
    # Write data to text file
    filepath = '/Users/ENTER/Documents/STEMBoard/Tiger Team/Lingo/' + text_list[i] + '.txt'
    file_object = open(filepath, 'a')
    file_object.write(orgdata)
    file_object.close()

来源：https://stackoverflow.com/questions/62181398/python-selenium-iterate-through-list-of-webelements-error-staleelementrefere

标签

python