问题
I am using Python/Selenium to extract some text from a website to further sort it in Google Sheets.
There are 15 headers for which I need to extract text. The text is found under each header in tag h5.
Here's one extract of a header:
<tr class="dayHeader">
<td colspan="7" style="padding:10px 0;">
<hr>
<h5> Tuesday - 02 February 2021</h5>
</td>
</tr>
What I have done is the following:
headers = driver.find_elements_by_tag_name('h5')
results = []
for header in headers:
result = header.text
results.append(result)
I'd prefer fetching the text from h5 going by the class above this tag, like so:
headers = driver.find_element(By.XPATH,"//tr[@class='dayHeader']/h5")
and add it to the mentioned for loop, but I can't seem to get this line to work. How can I do this?
回答1:
You were almost there. /
in xpath indicates first child. But the <h5>
isn't the first child of //tr[@class='dayHeader']
.
Solution
You need to replace the single forward slash i.e. /
with a double forward slash i.e. //
which will indicate a descendant. So your effective line of code will be:
print([my_elem.text for my_elem in driver.find_elements(By.XPATH, "//tr[@class='dayHeader']//h5")])
Ideally you need to induce WebDriverWait for visibility_of_all_elements_located()
and you can use the following Locator Strategy:
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tr[@class='dayHeader']//h5")))])
回答2:
Try this approach:
headers = [h.text for h in driver.find_elements(By.XPATH,"//tr[@class='dayHeader']/td/h5")]
This is a one-liner for extracting elements and extracting text values to a list.
来源:https://stackoverflow.com/questions/65942390/python-selenium-to-extract-elements-with-xpath-and-for-loop