How to combine R dataframes based constraints on a time column

后端 未结 2 1282
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-21 16:54

I have two R tables, each with a list of users and a timestamp corresponding to the time that they took a certain action.

The first of these (df1) two tabl

2条回答
  •  南笙
    南笙 (楼主)
    2021-01-21 17:32

    Part 1 - Original Question

    The first part of your question can be answered with the sqldf package.

    library(sqldf)
    df3 <- sqldf("SELECT * FROM df1 a 
                 LEFT JOIN df2 b ON a.time < b.time 
                 AND a.user = b.user")[,c(1:2, 4)]
    
    #rename to match OP post
    names(df3) <- c("user", "time_1", "time_2")
    
    > df3
      user              time_1              time_2
    1    1 2016-12-01 08:53:20 2016-12-01 11:50:11
    2    1 2016-12-01 12:45:47                
    3    2 2016-12-01 15:34:54                
    4    3 2016-12-01 00:49:50 2016-12-01 01:19:10
    

    Part 2 - Time Window

    If you want a window of time to allow for the match, you can subtract seconds within the SQL statement as follows:

    df3 <- sqldf("SELECT * FROM df1 a 
                 LEFT JOIN df2 b ON a.time < (b.time - 10000)
                 AND a.user = b.user")[,c(1:2, 4)]
    > df3
      user                time              time.1
    1    1 2016-12-01 08:53:20 2016-12-01 11:50:11
    2    1 2016-12-01 12:45:47                
    3    2 2016-12-01 15:34:54                
    4    3 2016-12-01 00:49:50                
    

    Note, whatever you select from b.time will be in seconds.

提交回复
热议问题