How to convert an portion of an XML into a data frame? (properly)

前端 未结 2 1739
后悔当初
后悔当初 2021-01-01 03:44

I am trying to extract information from an XML file from ClinicalTrials.gov. The file is organized in the following way:


  ...
  

        
2条回答
  •  刺人心
    刺人心 (楼主)
    2021-01-01 04:08

    You could flatten the XML first.

    flatten_xml <- function(x) {
      if (length(xmlChildren(x)) == 0) structure(list(xmlValue(x)), .Names = xmlName(xmlParent(x)))
      else Reduce(append, lapply(xmlChildren(x), flatten_xml))
    }
    
    dfs <- lapply(getNodeSet(xmlDoc,"//location"), function(x) data.frame(flatten_xml(x)))
    allnames <- unique(c(lapply(dfs, colnames), recursive = TRUE))
    df <- do.call(rbind, lapply(dfs, function(df) { df[, setdiff(allnames,colnames(df))] <- NA; df }))
    head(df)
    
     #          city      state   zip       country     status          last_name        phone                    email               last_name.1
     # 1  Birmingham    Alabama 35294 United States Recruiting Louis B Nabors, MD 205-934-1813          bnabors@uab.edu        Louis B Nabors, MD
     # 2      Mobile    Alabama 36604 United States Recruiting Melanie Alford, RN 251-445-9649     malford@usouthal.edu    Pamela Francisco, CCRP
     # 3     Phoenix    Arizona 85013 United States Recruiting     Lynn Ashby, MD 602-406-6262           LASHBY@CHW.EDU            Lynn Ashby, MD
     # 4      Tucson    Arizona 85724 United States Recruiting         Jamie Holt 520-626-6800 jholt1@email.arizona.edu Baldassarre Stea, MD, PhD
     # 5 Little Rock   Arkansas 72205 United States Recruiting   Wilma Brooks, RN 501-686-8530       ALEubanks@uams.edu       Amanda Eubanks, APN
     # 6    Berkeley California 94704 United States  Withdrawn                                                                   
    

提交回复
热议问题