How to extract text from divs in Selenium using Python when new divs are added every approx 1 second?

邮差的信 提交于 2019-12-13 01:24:18

问题


I am trying to extract the content from divs on a web page using Selenium. The web page is dynamically generated and every second or so there is a new div inserted into the HTML on the web page.

So far I have the following code:

from selenium import webdriver

chrome_path = r"C:\scrape\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)

driver.get("https://website.com/")

messages = []
for message in driver.find_elements_by_class_name('div_i_am_targeting'):
    messages.append(message.text)

for x in messages:
    print(x)

Which works fine, the problem is it only prints the values of the divs on the page at the time it is run, I want to continuously extract the text from the_div_i_am_targeting and there are new divs appearing on the page every second or so.

I found this: Handling dynamic div's in selenium Which was the closest related question I could find, but it isn't a match for my question and there are no answers.

How can I update the above code so that it continuously prints the contents of the divs on the page for my chosen div (in this example div_i_am_targeting) including new divs that are added to the page after the program runtime?


回答1:


You can apply below code to continuously print content of required divs:

from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver

chrome_path = r"C:\scrape\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
# Get current divs
messages = driver.find_elements_by_class_name('div_i_am_targeting')
# Print all messages
for message in messages:
    print(message.text)

while True:
    try:
        # Wait up to minute for new message to appear
        wait(driver, 60).until(lambda driver: driver.find_elements_by_class_name('div_i_am_targeting') != messages)
        # Print new message
        for message in [m.text for m in driver.find_elements_by_class_name('div_i_am_targeting') if m not in messages]:
            print(message)
        # Update list of messages
        messages = driver.find_elements_by_class_name('div_i_am_targeting')
    except:
        # Break the loop in case no new messages after minute passed
        print('No new messages')
        break


来源:https://stackoverflow.com/questions/53458309/how-to-extract-text-from-divs-in-selenium-using-python-when-new-divs-are-added-e

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!