Reshaping data in R with “login” “logout” times

后端 未结 6 1094
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-08 00:31

I\'m new to R, and am working on a side project for my own purposes. I have this data (reproducable dput of this is at the end of the question):

     X              


        
相关标签:
6条回答
  • 2021-01-08 00:54

    Another way to do it:

    library(tidyr)
    df <- df %>% spread(state, datetime)
    
    df_joined <- df[!is.na(df$joined), 2:3]
    df_joined <- df_joined[with(df_joined, order(user, joined)), ]
    
    df_left <- df[!is.na(df$left), c(2, 4)]
    df_left <- df_left[with(df_left, order(user, left)), ]
    
    merge(df_joined, df_left, all = TRUE, by = 'user')
    
    0 讨论(0)
  • 2021-01-08 00:55

    Using rowid() from the data.table-package along with dcast:

    require(data.table)
    dcast(dt, user + rowid(user, state) ~ state, value.var="datetime")
    
    #      user user_1              joined                left
    #  1: User1      1 2016-02-19 19:13:26 2016-02-19 19:13:32
    #  2: User1      2 2016-02-19 19:21:33 2016-02-19 19:25:26
    #  3: User1      3 2016-02-19 19:35:38 2016-02-19 19:42:16
    #  4: User1      4 2016-02-19 19:44:15 2016-02-19 19:47:59
    #  5: User1      5 2016-02-19 19:48:55 2016-02-19 19:51:06
    #  6: User1      6 2016-02-19 19:52:40                <NA>
    #  7: User2      1 2016-02-19 19:21:18 2016-02-19 19:30:30
    #  8: User3      1 2016-02-19 19:53:15 2016-02-19 20:02:26
    #  9: User3      2 2016-02-19 20:02:34                <NA>
    # 10: User3      3 2016-02-19 20:13:48                <NA>
    
    0 讨论(0)
  • 2021-01-08 00:56

    since tidyr 1.0.0 the following is possible :

    suppressPackageStartupMessages(library(tidyverse))
    pivot_wider(samp[-1], names_from = "state", values_from = "datetime", 
                values_fn = list(datetime = list)) %>%
      mutate(left = map2(left, lengths(joined),`length<-`)) %>%
      unchop(everything())
    
    #> # A tibble: 18 x 3
    #>   user  joined              left               
    #>   <fct> <fct>               <fct>              
    #>  1 User1 2016-02-19 19:13:26 2016-02-19 19:13:32
    #>  2 User1 2016-02-19 19:21:33 2016-02-19 19:25:26
    #>  3 User1 2016-02-19 19:35:38 2016-02-19 19:42:16
    #>  4 User1 2016-02-19 19:44:15 2016-02-19 19:47:59
    #>  5 User1 2016-02-19 19:48:55 2016-02-19 19:51:06
    #>  6 User1 2016-02-19 19:52:40 2016-02-19 20:48:22
    #>  7 User1 2016-02-19 21:06:20 2016-02-19 21:11:13
    #>  8 User1 2016-02-19 21:11:15 2016-02-19 21:17:33
    #>  9 User2 2016-02-19 19:21:18 2016-02-19 19:30:30
    #> 10 User3 2016-02-19 19:53:15 2016-02-19 20:02:26
    #> 11 User3 2016-02-19 20:02:34 2016-02-19 20:13:38
    #> 12 User3 2016-02-19 20:13:48 2016-02-19 20:42:27
    #> 13 User3 2016-02-19 20:49:31 NA                 
    #> 14 User3 2016-02-19 22:30:30 NA                 
    #> 15 User4 2016-02-19 20:59:58 2016-02-19 21:10:43
    #> 16 User4 2016-02-19 21:11:22 2016-02-19 22:02:45
    #> 17 User4 2016-02-19 22:05:18 2016-02-19 22:05:37
    #> 18 User4 2016-02-19 22:05:47 NA 
    
    • values_fn is set to store multiple values for a given user in a list
    • Because thes don't have the same length we complete the short ones with NAs using mutateand length<-
    • Then we unnest vertically by using unchop
    0 讨论(0)
  • 2021-01-08 01:10

    We need a sequence number that determines the order of datetime within each user+state group. The sequence number used here, in particular, it is a meaningful consecutive count of joined-[left] records in the reshaped data frame.

    Using spread from tidyr

    spread(within(samp[,-1],seq<-ave(as.numeric(datetime),user,state,FUN=order)),
      state,datetime)
    
    
    
        user seq              joined                left
    1  User1   1 2016-02-19 19:13:26 2016-02-19 19:13:32
    2  User1   2 2016-02-19 19:21:33 2016-02-19 19:25:26
    3  User1   3 2016-02-19 19:35:38 2016-02-19 19:42:16
    4  User1   4 2016-02-19 19:44:15 2016-02-19 19:47:59
    5  User1   5 2016-02-19 19:48:55 2016-02-19 19:51:06
    6  User1   6 2016-02-19 19:52:40 2016-02-19 20:48:22
    7  User1   7 2016-02-19 21:06:20 2016-02-19 21:11:13
    8  User1   8 2016-02-19 21:11:15 2016-02-19 21:17:33
    9  User2   1 2016-02-19 19:21:18 2016-02-19 19:30:30
    10 User3   1 2016-02-19 19:53:15 2016-02-19 20:02:26
    11 User3   2 2016-02-19 20:02:34 2016-02-19 20:13:38
    12 User3   3 2016-02-19 20:13:48 2016-02-19 20:42:27
    13 User3   4 2016-02-19 20:49:31                <NA>
    14 User3   5 2016-02-19 22:30:30                <NA>
    15 User4   1 2016-02-19 20:59:58 2016-02-19 21:10:43
    16 User4   2 2016-02-19 21:11:22 2016-02-19 22:02:45
    17 User4   3 2016-02-19 22:05:18 2016-02-19 22:05:37
    18 User4   4 2016-02-19 22:05:47                <NA>
    

    This may also be written with dcast from reshape2

    dcast(within(samp,seq<-ave(as.numeric(datetime),user,state,FUN=order)),
      user+seq~state, value.var="datetime")
    
    0 讨论(0)
  • 2021-01-08 01:12

    We can make use of the order of "left" and "joined", and match when one follows the other for each user.

    For this I'm going to use library(data.table)

    library(data.table)
    setDT(df)
    
    ## order the data by user and datetime
    df <- df[order(user, datetime)]
    ## add an 'order' column, which is a sequence from 1 to lenght()  
    ## for each user
    df[, order := seq(1:.N), by=user]
    
    ## split the left and joins
    dt_left <- df[state == "left"]
    dt_joined <- df[state == "joined"]
    
    ## assuming 'left' is after 'joined', shift the 'order' back for left
    dt_left[, order := order - 1]
    
    ## join user an dorder (and subsetting relevant columns) 
    ## keeping when there's a 'joined' but not a 'left'
    dt <- dt_left[, .(user, order, datetime)][dt_joined[, .(user, order, datetime)], on=c("user", "order"), nomatch=NA]
    
    ## rename columns
    setnames(dt, c("datetime", "i.datetime"), c("left", "joined"))
    
         user order                left              joined
     1: User1     1 2016-02-19 19:13:32 2016-02-19 19:13:26
     2: User1     3 2016-02-19 19:25:26 2016-02-19 19:21:33
     3: User1     5 2016-02-19 19:42:16 2016-02-19 19:35:38
     4: User1     7 2016-02-19 19:47:59 2016-02-19 19:44:15
     5: User1     9 2016-02-19 19:51:06 2016-02-19 19:48:55
     6: User1    11 2016-02-19 20:48:22 2016-02-19 19:52:40
     7: User1    13 2016-02-19 21:11:13 2016-02-19 21:06:20
     8: User1    15 2016-02-19 21:17:33 2016-02-19 21:11:15
     9: User2     1 2016-02-19 19:30:30 2016-02-19 19:21:18
    10: User3     1 2016-02-19 20:02:26 2016-02-19 19:53:15
    11: User3     3 2016-02-19 20:13:38 2016-02-19 20:02:34
    12: User3     5 2016-02-19 20:42:27 2016-02-19 20:13:48
    13: User3     7                  NA 2016-02-19 20:49:31
    14: User3     8                  NA 2016-02-19 22:30:30
    15: User4     1 2016-02-19 21:10:43 2016-02-19 20:59:58
    16: User4     3 2016-02-19 22:02:45 2016-02-19 21:11:22
    17: User4     5 2016-02-19 22:05:37 2016-02-19 22:05:18
    18: User4     7                  NA 2016-02-19 22:05:47
    
    0 讨论(0)
  • 2021-01-08 01:16

    Base version:

    samp$count <- with(samp, ave(as.character(user),list(state,user),FUN=seq_along) )
    
    out <- merge(
      samp[samp$state=="joined",c("user","datetime","count")],
      samp[samp$state=="left",c("user","datetime","count")],
      by=c("user","count"), all.x=TRUE
    )
    
    out[order(out$count),]
    
    0 讨论(0)
提交回复
热议问题