问题
I am new to selenium and I want to scrape data from https://www.nasdaq.com/market-activity/stocks/aapl I am particularly interested in data from Summary Data section.
As an example, I want to scrap the following data:
- Exchange: NASDAQ-GS
- Sector: Technology
- Industry: Computer Manufacturing
Here is the part of HTML code from the table that I want to extract:
<table class="summary-data__table" role="table">
<thead class="visually-hidden" role="rowgroup">
<tr role="row">
<th role="columnheader" scope="col">Label</th>
<th role="columnheader" scope="col">Value</th>
</tr>
</thead>
<tbody class="summary-data__table-body" role="rowgroup"><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
<td role="cell" class="summary-data__cellheading">Exchange</td><td role="cell" class="summary-data__cell">NASDAQ-GS</td>
</tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
<td role="cell" class="summary-data__cellheading">Sector</td><td role="cell" class="summary-data__cell">Technology</td>
</tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
<td role="cell" class="summary-data__cellheading">Industry</td><td role="cell" class="summary-data__cell">Computer Manufacturing</td>
</tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
<td role="cell" class="summary-data__cellheading">1 Year Target</td><td role="cell" class="summary-data__cell">$275.00</td>
</tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
<td role="cell" class="summary-data__cellheading">Today's High/Low</td><td role="cell" class="summary-data__cell">$271.00/$267.30</td>
</tr><tr class="summary-data__row" role="row" data-first-ten="true">
<td role="cell" class="summary-data__cellheading">Share Volume</td><td role="cell" class="summary-data__cell">26,547,493</td>
</tr></tbody>
</table>
This is the Python code that I have so far:
driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://www.nasdaq.com/market-activity/stocks/aapl')
time.sleep(20)
elements = driver.find_element_by_class_name("summary-data__table")
I am stuck as I can't iterate through the table using the code above.
回答1:
import requests
r = requests.get(
'https://api.nasdaq.com/api/quote/AAPL/summary?assetclass=stocks').json()
for key, value in r['data']['summaryData'].items():
print("{:<20} {}".format(key, value['value']))
Exchange NASDAQ-GS
Sector Technology
Industry Computer Manufacturing
OneYrTarget $275.00
TodayHighLow $271.00/$267.30
ShareVolume 26,547,493
AverageVolume 24,634,815
PreviousClose $265.58
FiftTwoWeekHighLow $268.25/$142.00
MarketCap 1,202,836,268,150
PERatio 22.84
ForwardPE1Yr 20.15
EarningsPerShare $11.85
AnnualizedDividend $3.08
ExDividendDate Nov 7, 2019
DividendPaymentDate Nov 14, 2019
Yield 1.17669%
Beta 1.02
回答2:
Your code uses find_element_by_class_name
which will only return one element and needs one class name. You should use find_elements_by_css_selector
. This will select all elements and do it with a more specific CSS query. You can read more here if you are interested.
Change your code to this:
elements = driver.find_elements_by_css_selector(".summary-data__table .summary-data__row")
This will go to all rows within the summary data row.
From there, you will be able to loop through all elements and do a subquery (key / value of each).
回答3:
To scrape the NASDAQ-GS, Technology and Computer Manufacturing fields you need to scrollIntoView()
the desired elements and then induce WebDriverWait for the visibility_of_element_located()
and you can use either of the following Locator Strategies:
Using
CSS_SELECTOR
:from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver.get("https://www.nasdaq.com/market-activity/stocks/aapl") driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.summary-data__header>h2.module-header")))) print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr td:nth-child(2)"))).text) print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr:nth-child(2) td:nth-child(2)"))).text) print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr:nth-child(3) td:nth-child(2)"))).text) driver.quit()
Using
XPATH
:from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver.get("https://www.nasdaq.com/market-activity/stocks/aapl") driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.summary-data__header>h2.module-header")))) print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']/tr//following-sibling::td[2]"))).get_attribute("innerHTML")) print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']//following-sibling::tr[1]//following-sibling::td[2]"))).get_attribute("innerHTML")) print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']//following-sibling::tr[2]//following-sibling::td[2]"))).get_attribute("innerHTML"))
Console Output:
NASDAQ-GS Technology Computer Manufacturing
来源:https://stackoverflow.com/questions/59241101/python-selenium-get-data-from-table