extract data from raw html in R

后端 未结 2 877
再見小時候
再見小時候 2020-12-02 02:51

I am trying to extract the values of all the values in all tabs from this page. http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm

I first tried downloa

2条回答
  •  情话喂你
    2020-12-02 03:10

    Just use function "htmlTreeParse" from XML

    library(XML)
    html <- htmlTreeParse("http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm",
                         useInternalNodes = T)
    xpathSApply(html, "//meta/@name")
    

    But in your case you have another problem. The data which you want to access is located in html frame. Code below can help you to read these data:

    library(XML)
    library(RCulr)
    url <- "http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm"
    html <- htmlTreeParse(url, useInternalNodes = T)
    frameUrl <- paste("http://www.imd.gov.in/section/hydro/dynamic/rfmaps/",
                      xpathSApply(html, "//frame[1]/@src"),
                      sep = "")
    
    htmlWithData = getURL(frameUrl,
                          httpheader = c("User-Agent" = "RCurl",
                                         "Referer" = url))
    
    dataXml <- htmlTreeParse(htmlWithData, isURL = F, useInternalNodes = T)
    xpathSApply(dataXml, "//body/table")
    

提交回复
热议问题