Scraping financial data with R and rvest

陌路散爱 提交于 2020-03-19 06:53:09

问题


I am trying to get financial data from morningstar.com; I want to get i.e. MSFT yearly revenue data.
They are in a row <div>of a main <div> table.
I followed some samples to get the main table:

url <- "http://financials.morningstar.com/income-statement/is.html?t=MSFT&region=usa&culture=en-US"
table <- url %>%
 read_html() %>%
 html_nodes(xpath='//*[@id="sfcontent"]/div[3]/div[3]') %>%
 html_table()

but I get an empty list(). html_nodes itself returns a {xml_nodeset (0)} that I don't know how to handle.


回答1:


read.csv("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:MSFT&region=usa&culture=en-US&cur=&reportType=is&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=865827&denominatorView=raw&number=3", skip=1)

   Fiscal.year.ends.in.June..USD.in.millions.except.per.share.data. X2011.06 X2012.06 X2013.06 X2014.06 X2015.06      TTM
1                                                           Revenue 69943.00 73723.00 77849.00 86833.00 93580.00 90758.00
2                                                   Cost of revenue 15577.00 17530.00 20249.00 26934.00 33038.00 31972.00
3                                                      Gross profit 54366.00 56193.00 57600.00 59899.00 60542.00 58786.00
4                                                Operating expenses       NA       NA       NA       NA       NA       NA
5                                          Research and development  9043.00  9811.00 10411.00 11381.00 12046.00 11943.00
6                                 Sales, General and administrative 18162.00 18426.00 20425.00 20632.00 20324.00 19862.00
7                             Restructuring, merger and acquisition       NA       NA       NA   127.00       NA       NA
8                                          Other operating expenses       NA  6193.00       NA       NA 10011.00  8871.00
9                                          Total operating expenses 27205.00 34430.00 30836.00 32140.00 42381.00 40676.00
10                                                 Operating income 27161.00 21763.00 26764.00 27759.00 18161.00 18110.00
11                                                 Interest Expense   295.00   380.00   429.00   597.00   781.00   869.00
12                                           Other income (expense)  1205.00   884.00   717.00   658.00  1127.00   883.00
13                                              Income before taxes 28071.00 22267.00 27052.00 27820.00 18507.00 18124.00
14                                       Provision for income taxes  4921.00  5289.00  5189.00  5746.00  6314.00  5851.00
15                            Net income from continuing operations 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
16                                                       Net income 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
17                      Net income available to common shareholders 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
18                                               Earnings per share       NA       NA       NA       NA       NA       NA
19                                                            Basic     2.73     2.02     2.61     2.66     1.49     1.51
20                                                          Diluted     2.69     2.00     2.58     2.63     1.48     1.50
21                              Weighted average shares outstanding       NA       NA       NA       NA       NA       NA
22                                                            Basic  8490.00  8396.00  8375.00  8299.00  8177.00  8114.00
23                                                          Diluted  8593.00  8506.00  8470.00  8399.00  8254.00  8183.00
24                                                           EBITDA 31132.00 25614.00 31236.00 33629.00 25245.00 24983.00

It's super-helpful to make browser Developer Tools "Network" tab your BFF.

(that URL came from inspecting what the "Export" button does).




回答2:


Stefano, you will probably find this to be very useful.

require(quantmod)
setwd("C:/Users/your_path_here/")
stocks <- c("AXP","BA","CAT","CSCO","CVX","DD","DIS","GE","GS","HD","IBM","INTC","JNJ","JPM","KO","MCD","MMM","MRK","MSFT","NKE","PFE","PG","T","TRV","UNH","UTX","V","VZ","WMT","XOM")

# equityList <- read.csv("EquityList.csv", header = FALSE, stringsAsFactors = FALSE)
# names(equityList) <- c ("Ticker")

for (i in 1 : length(stocks)) {   
        temp<-getFinancials(stocks[i],src="google",auto.assign=FALSE)
        write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Annual).csv",sep=""))
        write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Annual).csv",sep=""))
        write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Annual).csv",sep=""))
        write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Quarterly).csv",sep=""))
        write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Quaterly).csv",sep=""))
        write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Quaterly).csv",sep=""))
}


来源:https://stackoverflow.com/questions/34981137/scraping-financial-data-with-r-and-rvest

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!