Scraping multiple table out of webpage in R

烈酒焚心 提交于 2019-12-07 06:24:26

Base R is not able to access https. You can use a package like RCurl. The headers on the tables are actually seperate tables. The page is actually composed of 30+ tables. The data you want is most like given by table with a class = yfnc_datamodoutline1 :

url <- "https://in.finance.yahoo.com/q/pm?s=115748.BO"
library(XML)
library(RCurl)
appData <- getURL(url, ssl.verifypeer = FALSE)
doc <- htmlParse(appData)
appData <- doc['//table[@class="yfnc_datamodoutline1"]']
perftable <- readHTMLTable(appData[[1]], stringsAsFactors = F)
> perftable
V1      V2
1            Morningstar Return Rating:    2.00
2                  Year-to-Date Return:   2.77%
3                5-Year Average Return:   9.76%
4                   Number of Years Up:       4
5                 Number of Years Down:       1
6  Best 1 Yr Total Return (2014-12-31):  37.05%
7 Worst 1 Yr Total Return (2011-12-31): -27.26%
8         Best 3-Yr Total Return (N/A):  23.11%
9        Worst 3-Yr Total Return (N/A):  -0.33%

Here's an rvest version with an added function to extract a particular table from each fund page:

library(rvest)
library(dplyr)

pages <- c("https://in.finance.yahoo.com/q/pm?s=115748.BO", 
           "https://in.finance.yahoo.com/q/pm?s=115749.BO",
           "https://in.finance.yahoo.com/q/pm?s=115750.BO")


extract_tab <- function(sources, tab_idx) {

  data <- lapply(sources, function(x) {

    pg <- html(x)
    pg %>% html_nodes(xpath="//table[@class='yfnc_datamodoutline1']//table") -> tabs
    html_table(tabs[[tab_idx]])

  })

  names(data) <- gsub("pm\\?s=", "", basename(sources))

  data

}

extract_tab(pages, 1)

## $`115748.BO`
##                                      X1      X2
## 1            Morningstar Return Rating:    2.00
## 2                  Year-to-Date Return:   2.77%
## 3                5-Year Average Return:   9.76%
## 4                   Number of Years Up:       4
## 5                 Number of Years Down:       1
## 6  Best 1 Yr Total Return (2014-12-31):  37.05%
## 7 Worst 1 Yr Total Return (2011-12-31): -27.26%
## 8         Best 3-Yr Total Return (N/A):  23.11%
## 9        Worst 3-Yr Total Return (N/A):  -0.33%
## 
## $`115749.BO`
##                                      X1      X2
## 1            Morningstar Return Rating:    2.00
## 2                  Year-to-Date Return:   2.77%
## 3                5-Year Average Return:   9.77%
## 4                   Number of Years Up:       4
## 5                 Number of Years Down:       1
## 6  Best 1 Yr Total Return (2014-12-31):  37.05%
## 7 Worst 1 Yr Total Return (2011-12-31): -27.22%
## 8         Best 3-Yr Total Return (N/A):  23.11%
## 9        Worst 3-Yr Total Return (N/A):  -0.30%
## 
## $`115750.BO`
##                               X1    X2
## 1     Morningstar Return Rating:      
## 2           Year-to-Date Return: 1.95%
## 3         5-Year Average Return: 8.92%
## 4            Number of Years Up:      
## 5          Number of Years Down:      
## 6     Best 1 Yr Total Return ():   N/A
## 7    Worst 1 Yr Total Return ():   N/A
## 8  Best 3-Yr Total Return (N/A):   N/A
## 9 Worst 3-Yr Total Return (N/A):   N/A
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!