python beautifulsoup4 parsing google finance data

纵饮孤独 提交于 2019-12-24 20:48:53

问题


I'm new to using beautifulsoup and scraping in general so I'm trying to get my feet wet so to speak.

I'd like to get the first row of information for the Dow Jones Industrial Average from here: http://www.google.com/finance/historical?q=INDEXDJX%3A.DJI&ei=ZN_2UqD9NOTt6wHYrAE

While I can read the data and print(soup) outputs everything, I can't seem to get down far enough. How would I select the rows that I save into table? How about the first rows?

Thank you so much for your help!

import urllib.parse
import urllib.request
from bs4 import BeautifulSoup
import json
import sys
import os
import time
import csv
import errno

DJIA_URL = "http://www.google.com/finance/historical?q=INDEXDJX%3A.DJI&ei=ZN_2UqD9NOTt6wHYrAE"

def downloadData(queryString):
    with urllib.request.urlopen(queryString) as url:
        encoding = url.headers.get_content_charset()
        result = url.read().decode(encoding)
    return result

raw_html = downloadData(DJIA_URL)
soup = BeautifulSoup(raw_html)

#print(soup)

table = soup.findAll("table", {"class":"gf-table historical_price"})

回答1:


You want the second tr table row then:

prices = soup.find('table', class_='historical_price')
rows = prices.find_all('tr')
print rows[1]

or, to list all rows with prices info, skip the one with any th elements:

for row in rows:
    if row.th: continue

or use that first header as a source for dictionary keys:

keys = [th.text.strip() for th in rows[0].find_all('th')]
for row in rows[1:]:
    data = {key: td.text.strip() for key, td in zip(keys, row.find_all('td'))}
    print data

which produces:

{u'Volume': u'105,782,495', u'High': u'15,798.51', u'Low': u'15,625.53', u'Date': u'Feb 7, 2014', u'Close': u'15,794.08', u'Open': u'15,630.64'}
{u'Volume': u'106,979,691', u'High': u'15,632.09', u'Low': u'15,443.00', u'Date': u'Feb 6, 2014', u'Close': u'15,628.53', u'Open': u'15,443.83'}
{u'Volume': u'105,125,894', u'High': u'15,478.21', u'Low': u'15,340.69', u'Date': u'Feb 5, 2014', u'Close': u'15,440.23', u'Open': u'15,443.00'}
{u'Volume': u'124,106,548', u'High': u'15,481.85', u'Low': u'15,356.62', u'Date': u'Feb 4, 2014', u'Close': u'15,445.24', u'Open': u'15,372.93'}

etc.



来源:https://stackoverflow.com/questions/21654495/python-beautifulsoup4-parsing-google-finance-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!