Python Parsing HTML Table Generated by JavaScript

前端 未结 1 947
逝去的感伤
逝去的感伤 2020-12-20 17:33

I\'m trying to scrape a table from the NYSE website (http://www1.nyse.com/about/listed/IPO_Index.html) into a pandas dataframe. In order to do so, I have a setup like this:

1条回答
  •  温柔的废话
    2020-12-20 18:29

    In this case, you need something to run that javascript code for you.

    One option here would be to use selenium:

    from pandas.io.html import read_html
    from selenium import webdriver
    
    
    driver = webdriver.Firefox()
    driver.get('http://www1.nyse.com/about/listed/IPO_Index.html')
    
    table = driver.find_element_by_xpath('//div[@class="sp5"]/table//table/..')
    table_html = table.get_attribute('innerHTML')
    
    df = read_html(table_html)[0]
    print df
    
    driver.close()
    

    prints:

                                                        0        1          2   3
    0                                                Name   Symbol        NaT NaN
    1                       Performance Sports Group Ltd.      PSG 2014-06-20 NaN
    2                           Century Communities, Inc.      CCS 2014-06-18 NaN
    3                        Foresight Energy Partners LP     FELP 2014-06-18 NaN
    ...
    79  EGShares TCW EM Long Term Investment Grade Bon...     LEMF 2014-01-08 NaN
    80  EGShares TCW EM Short Term Investment Grade Bo...     SEMF 2014-01-08 NaN
    
    [81 rows x 4 columns]
    

    0 讨论(0)
提交回复
热议问题