When would I want to use the xmlParse function versus the xmlTreeParse function? Also, when are parameter values useInternalNodes=TRUE
Here some feedback after using XML package.
xmlParse is a version of xmlTreeParse where argument useInternalNodes is set to TRUE.xmlTreeParse. This can be not very efficient and unnecessary if you want just to extract partial part of the xml document.xmlParse. But you should know some xpath bases to manipulate the result.asText=TRUE if you have a text not a file or an url as input.Here an example where I show the difference between the 2 functions:
txt <- "<doc>
<el> aa </el>
</doc>"
library(XML)
res <- xmlParse(txt,asText=TRUE)
res.tree <- xmlTreeParse(txt,asText=TRUE)
Now inspecting the 2 objects:
class(res)
[1] "XMLInternalDocument" "XMLAbstractDocument"
> class(res.tree)
[1] "XMLDocument" "XMLAbstractDocument"
You see that res is an internal document. It is pointer to a C object. res.tree is an R object. You can get its attributes like this :
res.tree$doc$children
$doc
<doc>
<el>aa</el>
</doc>
For res, you should use a valid xpath request and one of theses functions ( xpathApply, xpathSApply ,getNodeSet) to inspect it. for example:
xpathApply(res,'//el')
Once you create a valid Xml Node , you can apply xmlValue, xmlGetAttr,..to extract node information. So here this 2 statements are equivalent:
## we have already an R object, just apply xmlValue to the right child
xmlValue(res.tree$doc$children$doc)
## xpathSApply create an R object and pass it to
xpathSApply(res,'//el',xmlValue)