Scrape Yahoo Finance Income Statement with Python

耗尽温柔 提交于 2019-12-01 18:55:34

This is made a little more difficult because the "Net Income" in enclosed in a <strong> tag, so bear with me, but I think this works:

import re, requests
from bs4 import BeautifulSoup

url = 'https://finance.yahoo.com/q/is?s=AAPL&annual'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
pattern = re.compile('Net Income')

title = soup.find('strong', text=pattern)
row = title.parent.parent # yes, yes, I know it's not the prettiest
cells = row.find_all('td')[1:] #exclude the <td> with 'Net Income'

values = [ c.text.strip() for c in cells ]

values, in this case, will contain the three table cells in that "Net Income" row (and, I might add, can easily be converted to ints - I just liked that they kept the ',' as strings)

In [10]: values
Out[10]: [u'53,394,000', u'39,510,000', u'37,037,000']

When I tested it on Alphabet (GOOG) - it doesn't work because they don't display an Income Statement I believe (https://finance.yahoo.com/q/is?s=GOOG&annual) but when I checked Facebook (FB), the values were returned correctly (https://finance.yahoo.com/q/is?s=FB&annual).

If you wanted to create a more dynamic script, you could use string formatting to format the url with whatever stock symbol you want, like this:

ticker_symbol = 'AAPL' # or 'FB' or any other ticker symbol
url = 'https://finance.yahoo.com/q/is?s={}&annual'.format(ticker_symbol))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!