问题
The idea is as follows. Every patient has a unique patient id, which we call hidenic_id. However this patient may be admitted to the hospital multiple times. On the other hand every entry has unique emtek_id.
Patient 110380 was admitted to the hospital 4/14/2001 11:08 and then transferred through the hospital and discharged on 4/24/2001 18:16. Now this patient again admitted on 5/11/2001 23:24 because he has different emtek_id now. He is discharged from the hospital on 5/25/2001 16:26. So you need to assign correct emtek_ids by checking the dates. If the date in the combined file is within the admission and discharge time period (or very close like 24 hours) we can assign that emtek_id.
How can I assign different emtek_IDs to entries with hidenic_id and admit time?
回答1:
I was intrested in your problem so I created some mock data and tried to solve the problem but I ran into some confusion myself and then posted my question, which I think is what you are asking but more general. You can see the response here: How can I tell if a time point exists between a set of before and after times
My post generates what I believe is what you are starting with and the checked answer is what I believe you are looking for. The full code is below. You will need to install zoo
and IRanges
.
Also, I did this in version 2.15.3. IRanges
did not install properly in 3.0.0.
## package installation
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")
install.packages("zoo")
## generate the emtek and hidenic file data
library(zoo)
date_string <- paste("2001", sample(12, 10, 3), sample(28,10), sep = "-")
time_string <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
"23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26")
entry_emtek <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
entry_emtek <- entry_emtek[order(entry_emtek)]
exit_emtek <- entry_emtek + 3600 * 24
emtek_file <- data.frame(emtek_id = 1:10, entry_emtek, exit_emtek)
hidenic_id <- 110380:110479
date_string <- paste("2001", sample(12, 100, replace = TRUE), sample(28,100, replace = T), sep = "-")
time_string <- rep(c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
"23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26"),10)
hidenic_time <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
hidenic_time <- hidenic_time[order(hidenic_time)]
hidenic_file <- data.frame(hidenic_id, hidenic_time)
## Find the intersection of emtek and hidenic times. This part was done by user: agstudy
library(IRanges)
## create a time intervals
subject <- IRanges(as.numeric(emtek_file$entry_emtek),
as.numeric(emtek_file$exit_emtek))
## create a time intervals (start=end here)
query <- IRanges(as.numeric(hidenic_file$hidenic_time),
as.numeric(hidenic_file$hidenic_time))
## find overlaps and extract rows (both time point and intervals)
emt.ids <- subjectHits(findOverlaps(query,subject))
hid.ids <- queryHits(findOverlaps(query,subject))
cbind(hidenic_file[hid.ids,],emtek_file[emt.ids,])
来源:https://stackoverflow.com/questions/17222981/mapping-multiple-ids-using-r