Web scraping of key stats in Yahoo! Finance with R

前端 未结 2 1591
你的背包
你的背包 2020-12-17 07:48

Is anyone experienced in scraping data from the Yahoo! Finance key statistics page with R? I am familiar scraping data directly from html using read_html,

相关标签:
2条回答
  • 2020-12-17 08:14

    I gave up on Excel a long time ago. R is definitely the way to go for things like this.

    library(XML)
    
    stocks <- c("AXP","BA","CAT","CSCO")
    
    for (s in stocks) {
          url <- paste0("http://finviz.com/quote.ashx?t=", s)
          webpage <- readLines(url)
          html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
          tableNodes <- getNodeSet(html, "//table")
    
          # ASSIGN TO STOCK NAMED DFS
          assign(s, readHTMLTable(tableNodes[[9]], 
                    header= c("data1", "data2", "data3", "data4", "data5", "data6",
                              "data7", "data8", "data9", "data10", "data11", "data12")))
    
          # ADD COLUMN TO IDENTIFY STOCK 
          df <- get(s)
          df['stock'] <- s
          assign(s, df)
    }
    
    # COMBINE ALL STOCK DATA 
    stockdatalist <- cbind(mget(stocks))
    stockdata <- do.call(rbind, stockdatalist)
    # MOVE STOCK ID TO FIRST COLUMN
    stockdata <- stockdata[, c(ncol(stockdata), 1:ncol(stockdata)-1)]
    
    # SAVE TO CSV
    write.table(stockdata, "C:/Users/your_path_here/Desktop/MyData.csv", sep=",", 
                row.names=FALSE, col.names=FALSE)
    
    # REMOVE TEMP OBJECTS
    rm(df, stockdatalist)
    
    0 讨论(0)
  • 2020-12-17 08:15

    I know this is an older thread, but I used it to scrape Yahoo Analyst tables so I figure I would share.

    # Yahoo webscrape Analysts
    library(XML)
    
    symbol = "HD"
    url <- paste('https://finance.yahoo.com/quote/HD/analysts?p=',symbol,sep="")
    webpage <- readLines(url)
    html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
    tableNodes <- getNodeSet(html, "//table")
    
    earningEstimates <- readHTMLTable(tableNodes[[1]])
    revenueEstimates <- readHTMLTable(tableNodes[[2]])
    earningHistory <- readHTMLTable(tableNodes[[3]])
    epsTrend <- readHTMLTable(tableNodes[[4]])
    epsRevisions <- readHTMLTable(tableNodes[[5]])
    growthEst <- readHTMLTable(tableNodes[[6]])
    

    Cheers, Sody

    0 讨论(0)
提交回复
热议问题