extract XML attributes and node values R

主宰稳场 提交于 2021-01-29 02:26:48

问题


I have an XML file in R. The XML file looks like this:

rootNode <- xmlRoot(xmlfile)
rootNode[[1]]

<pdv id="1000001" latitude="4620114" longitude="519791" cp="01000" pop="R">
  <adresse>ROUTE NATIONALE</adresse>
  <ville>SAINT-DENIS-LèS-BOURG</ville>
  <ouverture debut="01:00" fin="01:00" saufjour=""/>
  <services>
    <service>Automate CB</service>
    <service>Vente de gaz domestique</service>
    <service>Station de gonflage</service>
  </services>
  <prix nom="Gazole" id="1" maj="2014-01-02 11:08:03" valeur="1304"/>
  <prix nom="SP98" id="6" maj="2014-12-31 08:39:46" valeur="1285"/>
  <prix nom="Gazole" id="1" maj="2007-02-28 07:48:59.315736" valeur="999"/>
  <fermeture/>
  <rupture/>
</pdv> 

the rootNode[[2]] is:

<pdv id="1000002" latitude="4621842" longitude="522767" cp="01000" pop="R">
  <adresse>16 Avenue de Marboz</adresse>
  <ville>BOURG-EN-BRESSE</ville>
  <ouverture debut="08:45" fin="19:30" saufjour="Dimanche"/>
  <services>
    <service>Automate CB</service>
    <service>Vente de gaz domestique</service>
    <service>Station de gonflage</service>
  </services>
  <prix nom="Gazole" id="1" maj="2007-01-02 08:34:29.101626" valeur="995"/>
  <prix nom="Gazole" id="1" maj="2007-01-26 09:49:39.197356" valeur="977"/>
  <fermeture/>
  <rupture/>
</pdv> 

and so on.

I am running the next code to get the information about "valuer"

valeur = xpathApply(rootNode, "//prix", xmlGetAttr, "valeur")
valeur <- data.frame(matrix(unlist(valeur), byrow=T),stringsAsFactors=FALSE)

Actually, I am getting the values of "valuer", but the problem is: I can't identify that the three first values belong to rootNode[[1]] and the last two values belong to rootNode[[2]] and so on.

How can create a variable indicating that three first values belong to rootNode[[1]] and the other two to rootNode[[2]]? or at least how can I put a conditional that just bring me the values that belong to rootNode[[1]]?


回答1:


This might not be the most elegant solution, but it's the only one I could come up with after struggling with a very similar problem.

Here is a way to add the id from each pdv node as an attribute to each prix sub-node:

for (i in 1:xmlSize(rootNode)) {                

     id = xmlGetAttr(node = rootNode[[i]],    
                     name = "id")                 

     sapply(X = rootNode[[i]]["prix"],          
           fun = addAttributes,                   
           id = id)                                
 }

Depending on your needs, you could then easily create a data frame that matches these two values:

data.frame (id     =  xpathSApply(rootNode, "//prix", xmlGetAttr, "id" ), 
            valeur =  xpathSApply(rootNode, "//prix", xmlGetAttr, "valeur")
)

which returns:

       id valeur
1 1000001   1304
2 1000001   1285
3 1000001    999
4 1000002    995
5 1000002    977


来源:https://stackoverflow.com/questions/33591401/extract-xml-attributes-and-node-values-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!