Parsing an XML SAX way in R

…衆ロ難τιáo~ 提交于 2019-12-04 18:44:50

The Simple API for XML might improve the speed in parsing the XML data vs. another approach, but generally using SAX will not give you better results than XPath for example. On the contrary, for bigger files, it will allow not to load the complete tree in R, and thus avoid potential memory leaks.

For using SAX, you can use the below code example, which is based on the xmlEventParse branches (one branch per data you want to retrieve):

#a file to read with xmlEventParse
xmlDoc <- "example.xml"

desc <- NULL
items <- NULL

#function to use with xmlEventParse
row.sax = function() {

    #SAX function for Meta 'DESC'
    DESC = function(node){
        children <- xmlChildren(node)
        children[which(names(children) == "text")] <- NULL
        desc <<- rbind(desc, sapply(children,xmlValue))
    }

    #SAX function for Body 'ITEM'
    ITEM = function(node){
        children <- xmlChildren(node)
        children[which(names(children) == "text")] <- NULL
        items <<- rbind(items, sapply(children,xmlValue))
    }

    branches <- list(DESC = DESC, ITEM = ITEM)
    return(branches)
}

#call the xmlEventParse
xmlEventParse(xmlDoc, handlers = list(), branches = row.sax(),
              saxVersion = 2, trim = FALSE)

#processing the result as data.frame
desc <- as.data.frame(desc, stringsAsFactors = F)
desc <- desc[rep(row.names(desc[1,]), nrow(items)),]

items <- as.data.frame(items, stringsAsFactors = F)

result <- cbind(desc, items)
row.names(result) <- 1:nrow(result)

Let me know if it works for you

May be something like this?

library(rvest)
library(data.table)


test<-read_html("test.html") 
    data.table(do.call(cbind,lapply(c("fileid","code","value","ivalue","icode","itype"),function(i){
        test %>%
        html_nodes(i)%>%
        html_text()


    })))

         V1  V2     V3   V4  V5 V6
    1: 12347 ABC 100000 1000 CDF  R
    2: 12347 ABC 100000 1500 EGK  R
    3: 12347 ABC 100000  300 TSR  R
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!