Merge Records Over Time Interval

前端 未结 3 762
梦如初夏
梦如初夏 2020-12-10 08:35

Let me begin by saying this question pertains to R (stat programming language) but I\'m open straightforward suggestions for other environments.

The goal is to mer

相关标签:
3条回答
  • 2020-12-10 09:04

    Merge the two data.frames together with merge(). Then subset() the resulting data.frame with the condition time >= startTime & time <= startTime + Duration or whatever rules make sense to you.

    0 讨论(0)
  • 2020-12-10 09:18

    Set up data

    First set up the input data frames. We create two versions of the data frames: A and B just use character columns for the times and At and Bt use the chron package "times" class for the times (which has the advantage over "character" class that one can add and subtract them):

    LinesA <- "OBS ID StartTime Duration Outcome 
        1   01 10:12:06  00:00:10 Normal
        2   02 10:12:30  00:00:30 Weird
        3   01 10:15:12  00:01:15 Normal
        4   02 10:45:00  00:00:02 Normal"
    
    LinesB <- "OBS ID Time       
        1   01 10:12:10  
        2   01 10:12:17  
        3   02 10:12:45  
        4   01 10:13:00"
    
    A <- At <- read.table(textConnection(LinesA), header = TRUE, 
                   colClasses = c("numeric", rep("character", 4)))
    B <- Bt <- read.table(textConnection(LinesB), header = TRUE, 
                   colClasses = c("numeric", rep("character", 2)))
    
    # in At and Bt convert times columns to "times" class
    
    library(chron) 
    
    At$StartTime <- times(At$StartTime)
    At$Duration <- times(At$Duration)
    Bt$Time <- times(Bt$Time)
    

    sqldf with times class

    Now we can perform the calculation using the sqldf package. We use method="raw" (which does not assign classes to the output) so we must assign the "times" class to the output "Time" column ourself:

    library(sqldf)
    
    out <- sqldf("select Bt.OBS, ID, Time, Outcome from At join Bt using(ID)
       where Time between StartTime and StartTime + Duration",
       method = "raw")
    
    out$Time <- times(as.numeric(out$Time))
    

    The result is:

    > out
          OBS ID     Time Outcome
    1   1 01 10:12:10  Normal
    2   3 02 10:12:45   Weird
    

    With the development version of sqldf this can be done without using method="raw" and the "Time" column will automatically be set to "times" class by the sqldf class assignment heuristic:

    library(sqldf)
    source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R") # grab devel ver 
    sqldf("select Bt.OBS, ID, Time, Outcome from At join Bt using(ID)
        where Time between StartTime and StartTime + Duration")
    

    sqldf with character class

    Its actually possible to not use the "times" class by performing all time calculations in sqlite out of character strings employing sqlite's strftime function. The SQL statement is unfortunately a bit more involved:

    sqldf("select B.OBS, ID, Time, Outcome from A join B using(ID)
        where strftime('%s', Time) - strftime('%s', StartTime)
           between 0 and strftime('%s', Duration) - strftime('%s', '00:00:00')")
    

    EDIT:

    A series of edits which fixed grammar, added additional approaches and fixed/improved the read.table statements.

    EDIT:

    Simplified/improved final sqldf statement.

    0 讨论(0)
  • 2020-12-10 09:19

    here is an example:

    # first, merge by ID
    z <- merge(A[, -1], B, by = "ID")
    
    # convert string to POSIX time
    z <- transform(z,
      s_t = as.numeric(strptime(as.character(z$StartTime), "%H:%M:%S")),
      dur = as.numeric(strptime(as.character(z$Duration), "%H:%M:%S")) - 
        as.numeric(strptime("00:00:00", "%H:%M:%S")),
      tim = as.numeric(strptime(as.character(z$Time), "%H:%M:%S")))
    
    # subset by time range
    subset(z, s_t < tim & tim < s_t + dur)
    

    the output:

      ID StartTime Duration Outcome OBS     Time        s_t dur        tim
    1  1  10:12:06 00:00:10  Normal   1 10:12:10 1321665126  10 1321665130
    2  1  10:12:06 00:00:10  Normal   2 10:12:15 1321665126  10 1321665135
    7  2  10:12:30 00:00:30   Weird   3 10:12:45 1321665150  30 1321665165
    

    OBS #2 looks to be in the range. does it make sense?

    0 讨论(0)
提交回复
热议问题