When would I want to use the xmlParse
function versus the xmlTreeParse
function? Also, when are parameter values useInternalNodes=TRUE
Here some feedback after using XML package.
xmlParse
is a version of xmlTreeParse
where argument useInternalNodes
is set to TRUE.xmlTreeParse
. This can be not very efficient and unnecessary if you want just to extract partial part of the xml document.xmlParse
. But you should know some xpath
bases to manipulate the result.asText=TRUE
if you have a text not a file or an url as input.Here an example where I show the difference between the 2 functions:
txt <- "<doc>
<el> aa </el>
</doc>"
library(XML)
res <- xmlParse(txt,asText=TRUE)
res.tree <- xmlTreeParse(txt,asText=TRUE)
Now inspecting the 2 objects:
class(res)
[1] "XMLInternalDocument" "XMLAbstractDocument"
> class(res.tree)
[1] "XMLDocument" "XMLAbstractDocument"
You see that res is an internal document. It is pointer to a C object. res.tree is an R object. You can get its attributes like this :
res.tree$doc$children
$doc
<doc>
<el>aa</el>
</doc>
For res, you should use a valid xpath
request and one of theses functions ( xpathApply
, xpathSApply
,getNodeSet
) to inspect it. for example:
xpathApply(res,'//el')
Once you create a valid Xml Node , you can apply xmlValue
, xmlGetAttr
,..to extract node information. So here this 2 statements are equivalent:
## we have already an R object, just apply xmlValue to the right child
xmlValue(res.tree$doc$children$doc)
## xpathSApply create an R object and pass it to
xpathSApply(res,'//el',xmlValue)