How to scrape the first element of each parent using from The Wall Street Journal market-data quotes using Selenium and Python?

后端未结

关注

 3  1600

独厮守ぢ 2021-01-27 08:27

Here is the HTML that I\'m trying to scrape:

I am trying to get the first instance of \'td\' under each \'tr\' using Selenium (beautifulsoup won\'t work for this

3条回答

栀梦 (楼主)

2021-01-27 09:22
I took your code and simplified the structure and ran the test with minimal lines of code as follows:
```
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait


options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.wsj.com/market-data/quotes/MET/financials/annual/income-statement')
print(driver.page_source)
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table.cr_dataTable tbody tr>td[class]")))])
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='cr_dataTable']//tbody//tr/td[@class]")))])
```
Similarly, as per your observation I have hit the same roadblock that my tests didn't yeild and results.

While inspecting the Page Source of the webpage it was observed that there is an EventListener within a MET | MetLife Inc. Annual Income Statement - WSJ

Conclusion

This is a clear indication that the website is protected by vigorous Bot Management techniques and the navigation by Selenium driven WebDriver initiated Browsing Context gets detected and subsequently blocked.

Reference

You can find a relevant discussions in:
- Can a website detect when you are using selenium with chromedriver?
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

How to scrape the first element of each parent using from The Wall Street Journal market-data quotes using Selenium and Python?

Conclusion

Reference