More direct way to create list of dataframes from XML file?

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-08 10:40:27

问题


SDMX (Statistical Data and Metadata Exchange) is a 'XML' grammar that defines a standard for exchanging statistical data. It uses files called Dataset Structure Definition Description (DSD) to convey the structure of a dataset. Amongst other things the DSD contains a node Codelists that is comprised of the Codelist items which in turn are parent to the Code and Name item and attribuet. I am currently trying to parse these Codelists of a DSD file requested from Eurostats REST interface into a list of dataframes in R using the following code:

library(XML);library(RCurl)

# REST resource for DSD of nama_gdp_c
# downloading, parsing XML an setting root
file <- "http://ec.europa.eu/eurostat/SDMX/diss-web/rest/datastructure/ESTAT/DSD_nama_gdp_c"
content <- getURL(file, httpheader = list('User-Agent' = 'R-Agent'))
root <- xmlRoot(xmlInternalTreeParse(content, useInternalNodes = TRUE))

# get Nodeset of Codelists and its length
nodes <- getNodeSet(root,"//str:Codelist")
nn <- length(nodes)

# Create nested List of all Codes and Names
codelistAll <- lapply(seq(nn),function(i){
  xpathSApply(root,paste0("//str:Codelist[",i,"]/str:Code"),xmlGetAttr, "id")
})

namelistAll <- lapply(seq(nn),function(i){
  xpathSApply(root,paste0("//str:Codelist[",i,"]/str:Code/com:Name"),xmlValue)
})

# Create a list of dataframes from the nested lists
alldfList <-lapply(seq(nn),function(i) data.frame(codes=codelistAll[[i]],names=namelistAll[[i]]))

# Name the list items like the nodes
names(alldfList)  <- sapply(nodes, xmlGetAttr,"id")

This yields alldfList, the list of dataframes which I was looking for.

> str(alldfList)
List of 6
 $ CL_FREQ      :'data.frame':  6 obs. of  2 variables:
  ..$ codes: Factor w/ 6 levels "A","D","H","M",..: 2 6 5 1 4 3
  ..$ names: Factor w/ 6 levels "Annual","Daily",..: 2 6 4 1 3 5
 $ CL_GEO       :'data.frame':  49 obs. of  2 variables:
  ..$ codes: Factor w/ 49 levels "AT","BA","BE",..: 22 21 20 10 16 15 14 13 12 11 ...
  ..$ names: Factor w/ 49 levels "Austria","Belgium",..: 19 18 17 16 15 14 13 12 11 10 ...

While this does the job, I have the feeling that there must be a more straightforward syntax to achieve this. Especially the use of paste0 and the final assignment of names seem awkward. I have been reading through the documentation of the XML package and I suspect it must be some operation on the xlmChildren but I cannot wrap my head around how to actually do it. Does anyone have a suggestion for a canonical way of doing this operation? Any suggestion would be greatly appreciated.


回答1:


You can get the data.frames directly from nodes, but need to use a namespace

ns <- c(str="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/structure")

alldfList <- lapply(nodes, function(x){ data.frame(
  codes= xpathSApply(x, ".//str:Code" , xmlGetAttr, "id", namespaces=ns),
  names= xpathSApply(x, ".//str:Code" , xmlValue, namespaces=ns) )})

names(alldfList)  <- sapply(nodes, xmlGetAttr,"id")



回答2:


As you are trying to read SDMX-ML files in R, you can try the rsdmx package hosted in Github. The package is available for download in CRAN, and the latest version allows you to read Data Structure Definitions (DSDs) and components including Codelists, Concepts and KeyFamilies.

For installation, in case you can anyway easily install it from Github using the following:

require(devtools)
install_github("rsdmx", "opensdmx")

Taking your example for Codelists, you can easily coerce SDMX codelists to data.frame doing the following:

require(rsdmx)
file <- "http://ec.europa.eu/eurostat/SDMX/diss-web/rest/datastructure/ESTAT/DSD_nama_gdp_c"
sdmx <- readSDMX(file)

#get the list of codelist Id
codelists <- sapply(sdmx@codelists, function(x) x@id)

#get some specific codelist as data.frame
codelist <- as.data.frame(sdmx, codelistId = "CL_GEO")
head(codelist)

Similar can be done for SDMX Concepts / ConceptSchemes, complete Data Structure Definitions (DSD), and for sure SDMX datasets. Check out more examples at rsdmx wiki.

Hope this helps!



来源:https://stackoverflow.com/questions/24929109/more-direct-way-to-create-list-of-dataframes-from-xml-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!