How to scrape the first element of each parent using from The Wall Street Journal market-data quotes using Selenium and Python?

后端未结

关注

 3  1601

独厮守ぢ 2021-01-27 08:27

Here is the HTML that I\'m trying to scrape:

I am trying to get the first instance of \'td\' under each \'tr\' using Selenium (beautifulsoup won\'t work for this

3条回答

梦谈多话 (楼主)

2021-01-27 09:23

You can try get table with pandas Trying to scrape table using Pandas from Selenium's result

from selenium import webdriver
import pandas as pd
import os


# define path to chrome driver
chrome_driver = os.path.abspath('C:/Users/USER/Desktop/chromedriver.exe')
browser = webdriver.Chrome(chrome_driver)
browser.get("https://www.wsj.com/market-data/quotes/MET/financials/annual/income-statement")

# get table
df = pd.read_html(browser.page_source)[0]

# get values
val = [i for i in df["Fiscal year is January-December. All values USD Millions."].values if isinstance(i, str)]

0 讨论(0)

查看其它3个回答