问题
I was trying to answer a question on stack overflow (Mapping multiple IDs using R) when I got stuck with how to finish it. Namely, how can I test if there is a time point between a set of before and after time points.
The user from the post did not make a reproducible example but here is what I came up with. I want to test time points in hidenic_file$hidenic_time
with the before and after times in dataframe emtek_file
and return the emtek_id
's that match the time frame of each hidenic_id
. The poster didn't mention it but it seems like there is a possibility of multiple emtek_id
's being returned for each hidenic_id
.
library(zoo)
date_string <- paste("2001", sample(12, 10, 3), sample(28,10), sep = "-")
time_string <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
"23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26")
entry_emtek <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
entry_emtek <- entry_emtek[order(entry_emtek)]
exit_emtek <- entry_emtek + 3600 * 24
emtek_file <- data.frame(emtek_id = 1:10, entry_emtek, exit_emtek)
hidenic_id <- 110380:110479
date_string <- paste("2001", sample(12, 100, replace = TRUE), sample(28,100, replace = T), sep = "-")
time_string <- rep(c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
"23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26"),10)
hidenic_time <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
hidenic_time <- hidenic_time[order(hidenic_time)]
hidenic_file <- data.frame(hidenic_id, hidenic_time)
##Here is where I fail to write concise and working code to find what I want.
combined_file <- list()
for(i in seq(hidenic_file[,1])) {
for(j in seq(emtek_file[,1])) {
if(length(zoo(1, emtek_file[j,2:3]) + zoo(1,hidenic_file[i,2])) == 0) {next}
if(length(zoo(1, emtek_file[j,2:3]) + zoo(1,hidenic_file[i,2])) == 1) {combined_file[[i]] < c(combinedfile[[i]],emtek_file[j,1])}
}
names(combined_file)[i] <- hidenic_file[i,1]
}
回答1:
I am not sure to understand all what you want to do since you don't provide the expected result. Here a solution using IRanges
package. It is maybe not simple to understand at first reading but it is extremely useful to find overlaps for continuous intervals.
library(IRanges)
## create a time intervals
subject <- IRanges(as.numeric(emtek_file$entry_emtek),
as.numeric(emtek_file$exit_emtek))
## create a time intervals (start=end here)
query <- IRanges(as.numeric(hidenic_file$hidenic_time),
as.numeric(hidenic_file$hidenic_time))
## find overlaps and extract rows (both time point and intervals)
emt.ids <- subjectHits(findOverlaps(query,subject))
hid.ids <- queryHits(findOverlaps(query,subject))
cbind(hidenic_file[hid.ids,],emtek_file[emt.ids,])
hidenic_id hidenic_time emtek_id entry_emtek exit_emtek
8 110387 2001-03-13 22:29:56 3 2001-03-13 22:29:56 2001-03-14 22:29:56
9 110388 2001-03-14 01:03:30 3 2001-03-13 22:29:56 2001-03-14 22:29:56
41 110420 2001-06-09 16:56:26 7 2001-06-09 16:56:26 2001-06-10 16:56:26
Ps: to install the package :
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")
来源:https://stackoverflow.com/questions/17225208/how-can-i-tell-if-a-time-point-exists-between-a-set-of-before-and-after-times