Storing specific XML node values with R's xmlEventParse

后端 未结 3 1463
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-09 13:17

I have a big XML file which I need to parse with xmlEventParse in R. Unfortunately on-line examples are more complex than I need, and I just want to flag a matching node tag

3条回答
  •  盖世英雄少女心
    2020-12-09 14:02

    The branches method does not preserve the order of the events. In other words, the order of 'record' in branches$getStore() stores is different from that in the original xml file. On the other hand, the handler methods can preserve the order. Here is the code:

    fileName <- system.file("exampleData", "mtcars.xml", package="XML")
    records <- new('list')
    variable <- new('character')
    tag.open <- new('character')
    nvar <- 0
    xmlEventParse(fileName, list(startElement = function (name, attrs) {
      tagName <<- name
      tag.open <<- c(name, tag.open)
      if (length(attrs)) {
        attributes(tagName) <<- as.list(attrs)
      }
    }, text = function (x) {
      if (nchar(x) > 0) {
        if (tagName == "record") {
          record <- list()
          record[[attributes(tagName)$id]] <- x
          records <<- c(records, record)
        } else {
          if( tagName == 'variable') {
            v <- x
            variable <<- c( variable, v)
            nvar <<- nvar + 1
          }
        }
      }
    }, endElement = function (name) {
      if( name == 'record') {
        print(paste(tag.open, collapse='>'))
      }
      tag.open <<- tag.open[-1]
    }))
    
    head(records,2)
    $``Mazda RX4``
    [1] "21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4"
    
    $`Mazda RX4 Wag`
    [1] "21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4"
    
    variable
    [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"
    

    Another benefit of using handlers is that one can capture hierarchical structure. In other words, it is possible to save the ancestors as well. One of the key points of this process is the use of global variables, which can be assigned with "<<-", instead of "<-".

提交回复
热议问题