I am trying to extract information from an XML file from ClinicalTrials.gov. The file is organized in the following way:
...
This answer converts the XML to a list, unlists each location section, transposes the section, converts the section to a data.table
, and then uses rbindlist
to merge all of the individual locations into one table. The fill=T
argument matches the elements by name, and fills in missing element values with NA
.
library(XML); library(data.table)
clinicalTrialUrl <- "http://clinicaltrials.gov/ct2/show/NCT01480479?resultsxml=true"
xmlDoc <- xmlParse(clinicalTrialUrl, useInternalNode=TRUE)
xmlToDT <- function(doc, path) {
rbindlist(
lapply(getNodeSet(doc, path),
function(x) data.table(t(unlist(xmlToList(x))))
), fill=T)
}
locationDT <- xmlToDT(xmlDoc, "//location")
locationDT[1:6]
## facility.name facility.address.city facility.address.state facility.address.zip
## 1: "HYGEIA" Hospital Marousi District of Attica 151 23
## 2: Allina Health, Abbott Northwestern Hospital, John Nasseff Neuroscience Institute Minneapolis Minnesota 55407
## 3: Amrita Institute of Medical Sciences and Research Centre, Kochi Kochi Kerala 682 026
## 4: Anne Arundel Medical Center Annapolis Maryland 21401
## 5: Atlanta Cancer Care Atlanta Georgia 30005
## 6: Austin Health Heidelberg Victoria 3084
## facility.address.country
## 1: Greece
## 2: United States
## 3: India
## 4: United States
## 5: United States
## 6: Australia