python selenium get data from table

问题

I am new to selenium and I want to scrape data from https://www.nasdaq.com/market-activity/stocks/aapl I am particularly interested in data from Summary Data section.

As an example, I want to scrap the following data:

Exchange: NASDAQ-GS
Sector: Technology
Industry: Computer Manufacturing

Here is the part of HTML code from the table that I want to extract:

<table class="summary-data__table" role="table">
  <thead class="visually-hidden" role="rowgroup">
    <tr role="row">
      <th role="columnheader" scope="col">Label</th>
      <th role="columnheader" scope="col">Value</th>
    </tr>
  </thead>
  <tbody class="summary-data__table-body" role="rowgroup"><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Exchange</td><td role="cell" class="summary-data__cell">NASDAQ-GS</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Sector</td><td role="cell" class="summary-data__cell">Technology</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Industry</td><td role="cell" class="summary-data__cell">Computer Manufacturing</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">1 Year Target</td><td role="cell" class="summary-data__cell">$275.00</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Today's High/Low</td><td role="cell" class="summary-data__cell">$271.00/$267.30</td>
    </tr><tr class="summary-data__row" role="row" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Share Volume</td><td role="cell" class="summary-data__cell">26,547,493</td>
    </tr></tbody>
</table>

This is the Python code that I have so far:

driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://www.nasdaq.com/market-activity/stocks/aapl')
time.sleep(20)

elements = driver.find_element_by_class_name("summary-data__table")

I am stuck as I can't iterate through the table using the code above.

回答1:

import requests


r = requests.get(
    'https://api.nasdaq.com/api/quote/AAPL/summary?assetclass=stocks').json()

for key, value in r['data']['summaryData'].items():
    print("{:<20} {}".format(key, value['value']))

Exchange             NASDAQ-GS
Sector               Technology
Industry             Computer Manufacturing
OneYrTarget          $275.00
TodayHighLow         $271.00/$267.30
ShareVolume          26,547,493
AverageVolume        24,634,815
PreviousClose        $265.58
FiftTwoWeekHighLow   $268.25/$142.00
MarketCap            1,202,836,268,150
PERatio              22.84
ForwardPE1Yr         20.15
EarningsPerShare     $11.85
AnnualizedDividend   $3.08
ExDividendDate       Nov 7, 2019
DividendPaymentDate  Nov 14, 2019
Yield                1.17669%
Beta                 1.02

回答2:

Your code uses find_element_by_class_name which will only return one element and needs one class name. You should use find_elements_by_css_selector. This will select all elements and do it with a more specific CSS query. You can read more here if you are interested.

Change your code to this: elements = driver.find_elements_by_css_selector(".summary-data__table .summary-data__row")

This will go to all rows within the summary data row.

From there, you will be able to loop through all elements and do a subquery (key / value of each).

回答3:

To scrape the NASDAQ-GS, Technology and Computer Manufacturing fields you need to scrollIntoView() the desired elements and then induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver.get("https://www.nasdaq.com/market-activity/stocks/aapl")
driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.summary-data__header>h2.module-header"))))
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr td:nth-child(2)"))).text)
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr:nth-child(2) td:nth-child(2)"))).text)
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr:nth-child(3) td:nth-child(2)"))).text)
driver.quit()

Using XPATH:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver.get("https://www.nasdaq.com/market-activity/stocks/aapl")
driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.summary-data__header>h2.module-header"))))
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']/tr//following-sibling::td[2]"))).get_attribute("innerHTML"))
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']//following-sibling::tr[1]//following-sibling::td[2]"))).get_attribute("innerHTML"))
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']//following-sibling::tr[2]//following-sibling::td[2]"))).get_attribute("innerHTML"))

Console Output:

NASDAQ-GS
Technology
Computer Manufacturing

来源：https://stackoverflow.com/questions/59241101/python-selenium-get-data-from-table

标签

python

selenium