Python BeautifulSoup scrape Yahoo Finance value

走远了吗. 提交于 2020-01-07 02:52:29

问题


I am attempting to scrape the 'Full Time Employees' value of 110,000 from the Yahoo finance website.

The URL is: http://finance.yahoo.com/quote/AAPL/profile?p=AAPL

I have tried using Beautiful soup, but I can't find the value on the page. When I look in the DOM explorer in IE, I can see it. It has a tag with a parent tag which has a parent

which has a parent . The actual value is in a custom class of data-react-id.

code I have tried:

from bs4 import BeautifulSoup as bs
html=`http://finance.yahoo.com/quote/AAPL/profile?p=AAPL`
r = requests.get(html).content
soup = bs(r)

Not sure where to go.


回答1:


The problem is in the "requests" related part - the page you download with requests is not the same as you see in the browser. Browser executed all of the javascript, made multiple asynchronous requests needed to load this page. And, this particular page is quite dynamic itself. There is a lot happening on the "client-side".

What you can do is to load this page in a real browser automated by selenium. Working example:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Chrome()
driver.maximize_window()
driver.get("http://finance.yahoo.com/quote/AAPL/profile?p=AAPL")

# wait for the Full Time Employees to be visible
wait = WebDriverWait(driver, 10)
employees = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[. = 'Full Time Employees']/following-sibling::strong")))
print(employees.text)

driver.close()

Prints 110,000.




回答2:


There are so many ways to download financial data, or any kind of data, from the web. The script below downloads stock prices and saves everything to a CSV file.

import urllib2

listOfStocks = ["AAPL", "MSFT", "GOOG", "FB", "AMZN"]

urls = []

for company in listOfStocks:
    urls.append('http://real-chart.finance.yahoo.com/table.csv?s=' + company + '&d=6&e=28&f=2015&g=m&a=11&b=12&c=1980&ignore=.csv')

Output_File = open('C:/Users/your_path/Historical_Prices.csv','w')

New_Format_Data = ''

for counter in range(0, len(urls)):

    Original_Data = urllib2.urlopen(urls[counter]).read()

    if counter == 0:
        New_Format_Data = "Company," + urllib2.urlopen(urls[counter]).readline()

    rows = Original_Data.splitlines(1)

    for row in range(1, len(rows)):

        New_Format_Data = New_Format_Data + listOfStocks[counter] + ',' + rows[row]

Output_File.write(New_Format_Data)
Output_File.close()

The script below will download multiple stock tickers into one folder.

import urllib
import re
import json

symbolslist = open("C:/Users/rshuell001/Desktop/symbols/tickers.txt").read()
symbolslist = symbolslist.split("\n")

for symbol in symbolslist:
    myfile = open("C:/Users/your_path/Desktop/symbols/" +symbol +".txt", "w+")
    myfile.close()

    htmltext = urllib.urlopen("http://www.bloomberg.com/markets/chart/data/1D/"+ symbol+ ":US")
    data = json.load(htmltext)
    datapoints = data["data_values"]

    myfile = open("C:/Users/rshuell001/Desktop/symbols/" +symbol +".txt", "a")
    for point in datapoints:
        myfile.write(str(symbol+","+str(point[0])+","+str(point[1])+"\n"))
    myfile.close()

Finally...this will download prices for multiple stock tickers...

import urllib
import re

symbolfile = open("C:/Users/your_path/Desktop/symbols/amex.txt")
symbollist = symbolfile.read()

newsymbolslist = symbollist.split("\n")

i=0
while i<len(newsymbolslist):
    url = "http://finance.yahoo.com/q?s=" + newsymbolslist[i] + "&ql=1"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    regex = '<span id="yfs_l84_' + newsymbolslist[i] + '">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern,htmltext)
    print "the price of ", newsymbolslist[i] , "is", price[0]
    i+=1

# Make sure you place the 'amex.txt' file in 'C:\Python27\'

I wrote a book about these kinds of things, and lots of other stuff. You can find it using the URL below.

https://www.amazon.com/Automating-Business-Processes-Reducing-Increasing-ebook/dp/B01DJJKVZC/ref=sr_1_1?



来源:https://stackoverflow.com/questions/39197977/python-beautifulsoup-scrape-yahoo-finance-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!