Efficiently get the number of children with specific name using XML and R

耗尽温柔 提交于 2019-12-30 11:42:32

问题


Using R and the package XML I'm parsing huge XML files. As part of the data handling I need to now, in a long list of nodes, how many children of specific name each node has (the number of nodes can exceed 20.000)

My approach at the moment is:

nChildrenWithName <- xpathSApply(doc, path="/path/to/node/*", namespaces=ns, xmlName) == 'NAME'
nChildren <- xpathSApply(doc, path="/path/to/node", namespaces=ns, fun=xmlSize)
nID <- sapply(split(nChildrenWithName, rep(seq(along=nChildren), nChildren)), sum)

Which is as vectorized as I can get it. Still I have the feeling that this can be achieved in a single call using the correct XPATH expression. My knowledge on XPATH is limited though, so if anyone knows how to do it I would be grateful for some insight...

best Thomas


回答1:


library(XML)
doc <- xmlTreeParse(
  system.file("exampleData", "mtcars.xml", package="XML"),
  useInternalNodes=TRUE      )
xpathApply(xmlRoot(doc),path="count(//variable)",xmlValue)



回答2:


If I understand correctly the question, there is a XML like:

<path>
  <to>
    <node>
      <NAME>A</NAME>
      <NAME>B</NAME>
      <NAME>C</NAME>
    </node>
    <node>
      <NAME>X</NAME>
      <NAME>Y</NAME>
    </node>
  </to>
  <to>
    <node>
      <NAME>AA</NAME>
      <NAME>BB</NAME>
      <NAME>CC</NAME>
    </node>
  </to>
</path>

and what is wanted is the number of NAME elements under each node one - so 3, 2, 3 in the example above.

This is not possible in XPath 1.0: an expression can return a list of nodes or a single value - but not a list of computed values.

Using XPath 2.0 you can write:

for $node in /path/to/node return count($node/NAME)

or simply:

/path/to/node/count(NAME)

(You can test them here)




回答3:


Considering the example mentioned by MiMo

<path>
  <to>
    <node>
      <NAME>A</NAME>
      <NAME>B</NAME>
      <NAME>C</NAME>
    </node>
    <node>
      <NAME>X</NAME>
      <NAME>Y</NAME>
    </node>
  </to>
  <to>
    <node>
      <NAME>AA</NAME>
      <NAME>BB</NAME>
      <NAME>CC</NAME>
    </node>
  </to>
</path>

To get number of children under /path/to/node

library(XML)
doc = xmlParse("filename", useInternalNodes = TRUE)
rootNode = xmlRoot(doc)
childnodes = xpathSApply(rootNode[[1]][[1]], ".//NAME", xmlChildren)
length(childnodes)
[1] 3

It will give you 3, similarly to get number of children under second node just pass the index accordingly,

childnodes = xpathSApply(rootNode[[1]][[2]], ".//NAME", xmlChildren)
length(childnodes)
[1] 2

I hope it will help you.



来源:https://stackoverflow.com/questions/15948339/efficiently-get-the-number-of-children-with-specific-name-using-xml-and-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!