extract data from raw html in R

后端未结

关注

 2  877

再見小時候 2020-12-02 02:51

I am trying to extract the values of all the values in all tabs from this page. http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm

I first tried downloa

2条回答

情话喂你 (楼主)

2020-12-02 03:10

Just use function "htmlTreeParse" from XML

library(XML)
html <- htmlTreeParse("http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm",
                     useInternalNodes = T)
xpathSApply(html, "//meta/@name")

But in your case you have another problem. The data which you want to access is located in html frame. Code below can help you to read these data:

library(XML)
library(RCulr)
url <- "http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm"
html <- htmlTreeParse(url, useInternalNodes = T)
frameUrl <- paste("http://www.imd.gov.in/section/hydro/dynamic/rfmaps/",
                  xpathSApply(html, "//frame[1]/@src"),
                  sep = "")

htmlWithData = getURL(frameUrl,
                      httpheader = c("User-Agent" = "RCurl",
                                     "Referer" = url))

dataXml <- htmlTreeParse(htmlWithData, isURL = F, useInternalNodes = T)
xpathSApply(dataXml, "//body/table")

0 讨论(0)

查看其它2个回答