How to import a table with headings to a data frame using pandas module

前端 未结 2 1015
后悔当初
后悔当初 2021-01-20 23:53

I\'m trying to get information from a table in the internet as shown below. I\'m using jupyter notebook with python 2.7. I want to use this information in Python\'s panda modüle

2条回答
  •  青春惊慌失措
    2021-01-21 00:21

    Consider using an html web scraper like python's lxml module, html() method to scrape html table data and then migrate to a pandas dataframe. While there are automation features like pandas.read_html(), this approach provides more control over nuances in html content like the Feb 4 column span. Below uses an xpath expression on the position in table using brackets, []:

    import requests
    import pandas as pd
    from lxml import etree
    
    # READ IN AND PARSE WEB DATA
    url = "https://finance.yahoo.com/q/hp?s=AAPL+Historical+Prices"    
    rq = requests.get(url)
    htmlpage = etree.HTML(rq.content)
    
    # INITIALIZE LISTS
    dates = []  
    openstock = []
    highstock = []
    lowstock = []
    closestock = []
    volume = []
    adjclose = []
    
    # ITERATE THROUGH SEVEN COLUMNS OF TABLE
    for i in range(1,8):
        htmltable = htmlpage.xpath("//tr[td/@class='yfnc_tabledata1']/td[{}]".format(i))
    
        # APPEND COLUMN DATA TO CORRESPONDING LIST
        for row in htmltable:
            if i == 1: dates.append(row.text)
            if i == 2: openstock.append(row.text)
            if i == 3: highstock.append(row.text)
            if i == 4: lowstock.append(row.text)
            if i == 5: closestock.append(row.text)
            if i == 6: volume.append(row.text)
            if i == 7: adjclose.append(row.text)
    
    # CLEAN UP COLSPAN VALUE (AT FEB. 4)
    dates = [d for d in dates if len(d.strip()) > 3]
    del dates[7]
    del openstock[7]
    
    # MIGRATE LISTS TO DATA FRAME
    df = pd.DataFrame({'Dates':dates,
                       'Open':openstock,
                       'High':highstock,
                       'Low':lowstock,                   
                       'Close':closestock,
                       'Volume':volume,
                       'AdjClose':adjclose})
    
    #   AdjClose   Close         Dates    High     Low    Open       Volume
    #0     93.99   93.99  Feb 12, 2016   94.50   93.01   94.19   40,121,700
    #1     93.70   93.70  Feb 11, 2016   94.72   92.59   93.79   49,686,200
    #2     94.27   94.27  Feb 10, 2016   96.35   94.10   95.92   42,245,000
    #3     94.99   94.99   Feb 9, 2016   95.94   93.93   94.29   44,331,200
    #4     95.01   95.01   Feb 8, 2016   95.70   93.04   93.13   54,021,400
    #5     94.02   94.02   Feb 5, 2016   96.92   93.69   96.52   46,418,100
    #...
    #61   111.73  112.34  Nov 13, 2015  115.57  112.27  115.20   45,812,400
    #62   115.10  115.72  Nov 12, 2015  116.82  115.65  116.26   32,525,600
    #63   115.48  116.11  Nov 11, 2015  117.42  115.21  116.37   45,218,000
    #64   116.14  116.77  Nov 10, 2015  118.07  116.06  116.90   59,127,900
    #65   119.92  120.57   Nov 9, 2015  121.81  120.05  120.96   33,871,400
    

提交回复
热议问题