R: web scraping yahoo.finance after 2019 change

前端 未结 2 1398
暗喜
暗喜 2020-12-17 05:48

I have been happily web scraping yahoo.finance pages for a long time using code largely borrowed from other stackoverflow answers and it has worked great, however in the las

2条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-17 06:13

    As mentioned in the comment above, here is an alternative that tries to deal with the different table sizes published. I have worked on this and have had help from a friend.

    library(rvest)
    library(tidyverse)
    
    url <- https://finance.yahoo.com/quote/AAPL/financials?p=AAPL
    
    # Download the data
    raw_table <- read_html(url) %>% html_nodes("div.D\\(tbr\\)")
    
    number_of_columns <- raw_table[1] %>% html_nodes("span") %>% length()
    
    if(number_of_columns > 1){
      # Create empty data frame with the required dimentions
      df <- data.frame(matrix(ncol = number_of_columns, nrow = length(raw_table)),
                          stringsAsFactors = F)
    
      # Fill the table looping through rows
      for (i in 1:length(raw_table)) {
        # Find the row name and set it.
        df[i, 1] <- raw_table[i] %>% html_nodes("div.Ta\\(start\\)") %>% html_text()
        # Now grab the values
        row_values <- raw_table[i] %>% html_nodes("div.Ta\\(end\\)")
        for (j in 1:(number_of_columns - 1)) {
          df[i, j+1] <- row_values[j] %>% html_text()
        }
      }
    view(df)
    

提交回复
热议问题