How to scrape the first element of each parent using from The Wall Street Journal market-data quotes using Selenium and Python?

后端 未结 3 1601
独厮守ぢ
独厮守ぢ 2021-01-27 08:27

Here is the HTML that I\'m trying to scrape:

I am trying to get the first instance of \'td\' under each \'tr\' using Selenium (beautifulsoup won\'t work for this

3条回答
  •  梦谈多话
    2021-01-27 09:23

    You can try get table with pandas Trying to scrape table using Pandas from Selenium's result

    from selenium import webdriver
    import pandas as pd
    import os
    
    
    # define path to chrome driver
    chrome_driver = os.path.abspath('C:/Users/USER/Desktop/chromedriver.exe')
    browser = webdriver.Chrome(chrome_driver)
    browser.get("https://www.wsj.com/market-data/quotes/MET/financials/annual/income-statement")
    
    # get table
    df = pd.read_html(browser.page_source)[0]
    
    # get values
    val = [i for i in df["Fiscal year is January-December. All values USD Millions."].values if isinstance(i, str)]
    

提交回复
热议问题