问题
I want to extract both 'lat' and 'long' from a .xml file like this:
<asdf>
<dataset>
<px lon="-55.75" lat="-18.5">2.186213</px>
<px lon="-50.0" lat="-18.5">0.0</px>
<px lon="-66.75" lat="-03.0">1.68412</px>
</dataset>
</asdf>
this is what I've done so far, using the R::XML package:
#Load library for xml loading reading extracting
library(XML)
#Parse xml file
a3 <- xmlRoot(xmlTreeParse("my_file.xml"))
#Extract text-value and attributes as lists
precip <- xmlSApply(a3, function(x) xmlSApply(x, xmlValue))
long <- xmlSApply(a3, function(x) xmlSApply(x, xmlAttrs))
lat <- xmlSApply(a3, function(x) xmlSApply(x, xmlAttrs)) #???
dt.lat.long.val <- data.frame(as.numeric(as.vector(lat)),
as.numeric(as.vector(long)),
as.numeric(as.vector(precip)))
How do I edit the line ending in #??? so to get the lat values?
回答1:
You can extract the data using something along these lines
test <- '<asdf>
<dataset>
<px lon="-55.75" lat="-18.5">2.186213</px>
<px lon="-50.0" lat="-18.5">0.0</px>
<px lon="-66.75" lat="-03.0">1.68412</px>
</dataset>
</asdf>'
library(XML)
a3 <- xmlParse(test)
out <- xpathApply(a3, "//px", function(x){
coords <- xmlAttrs(x)
data.frame(precip = xmlValue(x), lon = coords[1], lat = coords[2], stringsAsFactors = FALSE)
})
> do.call(rbind.data.frame, out)
precip lon lat
lon 2.186213 -55.75 -18.5
lon1 0.0 -50.0 -18.5
lon2 1.68412 -66.75 -03.0
来源:https://stackoverflow.com/questions/23567988/extract-second-attribute-of-a-xml-node-in-r-xml-package