python selenium get data from table

为君一笑 提交于 2021-02-04 08:13:31

问题


I am new to selenium and I want to scrape data from https://www.nasdaq.com/market-activity/stocks/aapl I am particularly interested in data from Summary Data section.

As an example, I want to scrap the following data:

  1. Exchange: NASDAQ-GS
  2. Sector: Technology
  3. Industry: Computer Manufacturing

Here is the part of HTML code from the table that I want to extract:

<table class="summary-data__table" role="table">
  <thead class="visually-hidden" role="rowgroup">
    <tr role="row">
      <th role="columnheader" scope="col">Label</th>
      <th role="columnheader" scope="col">Value</th>
    </tr>
  </thead>
  <tbody class="summary-data__table-body" role="rowgroup"><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Exchange</td><td role="cell" class="summary-data__cell">NASDAQ-GS</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Sector</td><td role="cell" class="summary-data__cell">Technology</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Industry</td><td role="cell" class="summary-data__cell">Computer Manufacturing</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">1 Year Target</td><td role="cell" class="summary-data__cell">$275.00</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Today's High/Low</td><td role="cell" class="summary-data__cell">$271.00/$267.30</td>
    </tr><tr class="summary-data__row" role="row" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Share Volume</td><td role="cell" class="summary-data__cell">26,547,493</td>
    </tr></tbody>
</table>

This is the Python code that I have so far:

driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://www.nasdaq.com/market-activity/stocks/aapl')
time.sleep(20)

elements = driver.find_element_by_class_name("summary-data__table")

I am stuck as I can't iterate through the table using the code above.


回答1:


import requests


r = requests.get(
    'https://api.nasdaq.com/api/quote/AAPL/summary?assetclass=stocks').json()

for key, value in r['data']['summaryData'].items():
    print("{:<20} {}".format(key, value['value']))
Exchange             NASDAQ-GS
Sector               Technology
Industry             Computer Manufacturing
OneYrTarget          $275.00
TodayHighLow         $271.00/$267.30
ShareVolume          26,547,493
AverageVolume        24,634,815
PreviousClose        $265.58
FiftTwoWeekHighLow   $268.25/$142.00
MarketCap            1,202,836,268,150
PERatio              22.84
ForwardPE1Yr         20.15
EarningsPerShare     $11.85
AnnualizedDividend   $3.08
ExDividendDate       Nov 7, 2019
DividendPaymentDate  Nov 14, 2019
Yield                1.17669%
Beta                 1.02



回答2:


Your code uses find_element_by_class_name which will only return one element and needs one class name. You should use find_elements_by_css_selector. This will select all elements and do it with a more specific CSS query. You can read more here if you are interested.

Change your code to this: elements = driver.find_elements_by_css_selector(".summary-data__table .summary-data__row")

This will go to all rows within the summary data row.

From there, you will be able to loop through all elements and do a subquery (key / value of each).




回答3:


To scrape the NASDAQ-GS, Technology and Computer Manufacturing fields you need to scrollIntoView() the desired elements and then induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver.get("https://www.nasdaq.com/market-activity/stocks/aapl")
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.summary-data__header>h2.module-header"))))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr td:nth-child(2)"))).text)
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr:nth-child(2) td:nth-child(2)"))).text)
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr:nth-child(3) td:nth-child(2)"))).text)
    driver.quit()
    
  • Using XPATH:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver.get("https://www.nasdaq.com/market-activity/stocks/aapl")
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.summary-data__header>h2.module-header"))))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']/tr//following-sibling::td[2]"))).get_attribute("innerHTML"))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']//following-sibling::tr[1]//following-sibling::td[2]"))).get_attribute("innerHTML"))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']//following-sibling::tr[2]//following-sibling::td[2]"))).get_attribute("innerHTML"))
    
  • Console Output:

    NASDAQ-GS
    Technology
    Computer Manufacturing
    


来源:https://stackoverflow.com/questions/59241101/python-selenium-get-data-from-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!