Selenium printing same information repeatedly

问题

Hello I am trying to scrape some data from a website that has data in its 'dl' tag here is how the website structure looks

<div class="ecord-overview col-md-5">
<h2><span itemprop="name">Donald Duck</span></h2>
dl class="row">
</dd>
<dt class="col-md-4">Email</dt>
<dd class="col-md-8">myemail.com</dd>
</dl>
<div class="ecord-overview col-md-5">
<h2><span itemprop="name">Mickey mouse</span></h2>
dl class="row">
</dd>
<dt class="col-md-4">Email</dt>
<dd class="col-md-8">youremail.com</dd>
</dl>
... data goes on but value differs

To scrape this i am using selenium:

my code for scraping

for element in driver.find_elements_by_class_name('ThatsThem-record-overview'): # here im scraping name
   #print(Style.RESET_ALL)
   print(Fore.RED + element.text + Style.RESET_ALL)
   #print(Style.RESET_ALL)
   time.sleep(1)
   dl= driver.find_element_by_tag_name('dl') # scraping data under dl tag 
   print(dl.text)
   print('-----------------------')# seperator

So what is happening that whenever i execute the program it prints the dl stuff same for every name and data like this

donald duck
Email
myemail.com
-------------
mickey mouse
Email
myemail.com

I have already tried putting dl in for loop the same way i am doing to print name but it prints other things as well that i don't want

what can i do?

回答1:

driver.find_element_by_tag_name('dl') will always return the first matching element. You need to use element to locate the <dl>s

for element in driver.find_elements_by_class_name('ThatsThem-record-overview'):
    dl = element.find_element_by_tag_name('dl') # scraping data under dl tag 
    print(dl.text)

Or just locate those elements directly

for element in driver.find_elements_by_css_selector('.ThatsThem-record-overview dl'):
    print(element.text)

回答2:

Seems you were close. Using the class record-overview should have fetched you all the required data. However it would be better to target the individual name and email by traversing to the child tags. Additionally inducing WebDriverWait will optimize your program performance.

So, ideally you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR:

names[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.record-overview>h2>span")))]
emails[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.record-overview dl.row dd")))]
for name, email in zip(names, emails):
    print("{} Email is {}".format(name, email))

Using XPATH:

names[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'record-overview')]/h2/span")))]
emails[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'record-overview')]//dl[@class='row']//dd")))]
for name, email in zip(names, emails):
    print("{} Email is {}".format(name, email))

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

来源：https://stackoverflow.com/questions/59751428/selenium-printing-same-information-repeatedly

标签

python

python-3.x

selenium