I\'m trying to get information from a table in the internet as shown below. I\'m using jupyter notebook with python 2.7. I want to use this information in Python\'s panda modüle
Consider using an html web scraper like python's lxml module, html() method to scrape html table data and then migrate to a pandas dataframe. While there are automation features like pandas.read_html(), this approach provides more control over nuances in html content like the Feb 4 column span. Below uses an xpath expression on the position in table using brackets, []:
import requests
import pandas as pd
from lxml import etree
# READ IN AND PARSE WEB DATA
url = "https://finance.yahoo.com/q/hp?s=AAPL+Historical+Prices"
rq = requests.get(url)
htmlpage = etree.HTML(rq.content)
# INITIALIZE LISTS
dates = []
openstock = []
highstock = []
lowstock = []
closestock = []
volume = []
adjclose = []
# ITERATE THROUGH SEVEN COLUMNS OF TABLE
for i in range(1,8):
htmltable = htmlpage.xpath("//tr[td/@class='yfnc_tabledata1']/td[{}]".format(i))
# APPEND COLUMN DATA TO CORRESPONDING LIST
for row in htmltable:
if i == 1: dates.append(row.text)
if i == 2: openstock.append(row.text)
if i == 3: highstock.append(row.text)
if i == 4: lowstock.append(row.text)
if i == 5: closestock.append(row.text)
if i == 6: volume.append(row.text)
if i == 7: adjclose.append(row.text)
# CLEAN UP COLSPAN VALUE (AT FEB. 4)
dates = [d for d in dates if len(d.strip()) > 3]
del dates[7]
del openstock[7]
# MIGRATE LISTS TO DATA FRAME
df = pd.DataFrame({'Dates':dates,
'Open':openstock,
'High':highstock,
'Low':lowstock,
'Close':closestock,
'Volume':volume,
'AdjClose':adjclose})
# AdjClose Close Dates High Low Open Volume
#0 93.99 93.99 Feb 12, 2016 94.50 93.01 94.19 40,121,700
#1 93.70 93.70 Feb 11, 2016 94.72 92.59 93.79 49,686,200
#2 94.27 94.27 Feb 10, 2016 96.35 94.10 95.92 42,245,000
#3 94.99 94.99 Feb 9, 2016 95.94 93.93 94.29 44,331,200
#4 95.01 95.01 Feb 8, 2016 95.70 93.04 93.13 54,021,400
#5 94.02 94.02 Feb 5, 2016 96.92 93.69 96.52 46,418,100
#...
#61 111.73 112.34 Nov 13, 2015 115.57 112.27 115.20 45,812,400
#62 115.10 115.72 Nov 12, 2015 116.82 115.65 116.26 32,525,600
#63 115.48 116.11 Nov 11, 2015 117.42 115.21 116.37 45,218,000
#64 116.14 116.77 Nov 10, 2015 118.07 116.06 116.90 59,127,900
#65 119.92 120.57 Nov 9, 2015 121.81 120.05 120.96 33,871,400