R dataframe from XML when values are multiple or missing

前端 未结 4 1261
盖世英雄少女心
盖世英雄少女心 2020-12-16 08:40

This question is similar to a previous question, Import all fields (and subfields) of XML as dataframe, but I want to pull out only a subset of the XML data and want to incl

4条回答
  •  没有蜡笔的小新
    2020-12-16 09:06

    If you're looking to exactly reproduce the desired output you showed in your question, you can convert your XML to a list and then extract the information you want:

    xml_list <- xmlToList(xmlParse(xml_data))
    

    First loop through each "building" node and remove those that contain "station":

    xml_list <- lapply(xml_list, lapply, function(x) {
      x[!sapply(x, function(y) any(y == "station"))]
    })
    

    Then collect data for each city into a data frame

    xml_list <- lapply(xml_list, function(x) {
      bldgs <- unlist(x$buildings)
      bldgs <- bldgs[bldgs != "landmark"]
      if(is.null(bldgs)) bldgs <- NA
      data.frame(
        "city" = x$name,
        "landmark" = bldgs,
        stringsAsFactors = FALSE)
    })
    

    Then combine information from all cities together:

    xml_output <- do.call("rbind", xml_list)
    xml_output
               city     landmark
    city     London Tower Bridge
    city1  New York         
    city.1    Paris Eiffel Tower
    city.2    Paris       Louvre
    

提交回复
热议问题