I\'m new to R, and am working on a side project for my own purposes. I have this data (reproducable dput of this is at the end of the question):
X
Another way to do it:
library(tidyr)
df <- df %>% spread(state, datetime)
df_joined <- df[!is.na(df$joined), 2:3]
df_joined <- df_joined[with(df_joined, order(user, joined)), ]
df_left <- df[!is.na(df$left), c(2, 4)]
df_left <- df_left[with(df_left, order(user, left)), ]
merge(df_joined, df_left, all = TRUE, by = 'user')
Using rowid() from the data.table-package along with dcast:
require(data.table)
dcast(dt, user + rowid(user, state) ~ state, value.var="datetime")
# user user_1 joined left
# 1: User1 1 2016-02-19 19:13:26 2016-02-19 19:13:32
# 2: User1 2 2016-02-19 19:21:33 2016-02-19 19:25:26
# 3: User1 3 2016-02-19 19:35:38 2016-02-19 19:42:16
# 4: User1 4 2016-02-19 19:44:15 2016-02-19 19:47:59
# 5: User1 5 2016-02-19 19:48:55 2016-02-19 19:51:06
# 6: User1 6 2016-02-19 19:52:40 <NA>
# 7: User2 1 2016-02-19 19:21:18 2016-02-19 19:30:30
# 8: User3 1 2016-02-19 19:53:15 2016-02-19 20:02:26
# 9: User3 2 2016-02-19 20:02:34 <NA>
# 10: User3 3 2016-02-19 20:13:48 <NA>
since tidyr 1.0.0 the following is possible :
suppressPackageStartupMessages(library(tidyverse))
pivot_wider(samp[-1], names_from = "state", values_from = "datetime",
values_fn = list(datetime = list)) %>%
mutate(left = map2(left, lengths(joined),`length<-`)) %>%
unchop(everything())
#> # A tibble: 18 x 3
#> user joined left
#> <fct> <fct> <fct>
#> 1 User1 2016-02-19 19:13:26 2016-02-19 19:13:32
#> 2 User1 2016-02-19 19:21:33 2016-02-19 19:25:26
#> 3 User1 2016-02-19 19:35:38 2016-02-19 19:42:16
#> 4 User1 2016-02-19 19:44:15 2016-02-19 19:47:59
#> 5 User1 2016-02-19 19:48:55 2016-02-19 19:51:06
#> 6 User1 2016-02-19 19:52:40 2016-02-19 20:48:22
#> 7 User1 2016-02-19 21:06:20 2016-02-19 21:11:13
#> 8 User1 2016-02-19 21:11:15 2016-02-19 21:17:33
#> 9 User2 2016-02-19 19:21:18 2016-02-19 19:30:30
#> 10 User3 2016-02-19 19:53:15 2016-02-19 20:02:26
#> 11 User3 2016-02-19 20:02:34 2016-02-19 20:13:38
#> 12 User3 2016-02-19 20:13:48 2016-02-19 20:42:27
#> 13 User3 2016-02-19 20:49:31 NA
#> 14 User3 2016-02-19 22:30:30 NA
#> 15 User4 2016-02-19 20:59:58 2016-02-19 21:10:43
#> 16 User4 2016-02-19 21:11:22 2016-02-19 22:02:45
#> 17 User4 2016-02-19 22:05:18 2016-02-19 22:05:37
#> 18 User4 2016-02-19 22:05:47 NA
values_fn is set to store multiple values for a given user in a listmutateand length<-unchopWe need a sequence number that determines the order of datetime within each user+state group. The sequence number used here, in particular, it is a meaningful consecutive count of joined-[left] records in the reshaped data frame.
Using spread from tidyr
spread(within(samp[,-1],seq<-ave(as.numeric(datetime),user,state,FUN=order)),
state,datetime)
user seq joined left
1 User1 1 2016-02-19 19:13:26 2016-02-19 19:13:32
2 User1 2 2016-02-19 19:21:33 2016-02-19 19:25:26
3 User1 3 2016-02-19 19:35:38 2016-02-19 19:42:16
4 User1 4 2016-02-19 19:44:15 2016-02-19 19:47:59
5 User1 5 2016-02-19 19:48:55 2016-02-19 19:51:06
6 User1 6 2016-02-19 19:52:40 2016-02-19 20:48:22
7 User1 7 2016-02-19 21:06:20 2016-02-19 21:11:13
8 User1 8 2016-02-19 21:11:15 2016-02-19 21:17:33
9 User2 1 2016-02-19 19:21:18 2016-02-19 19:30:30
10 User3 1 2016-02-19 19:53:15 2016-02-19 20:02:26
11 User3 2 2016-02-19 20:02:34 2016-02-19 20:13:38
12 User3 3 2016-02-19 20:13:48 2016-02-19 20:42:27
13 User3 4 2016-02-19 20:49:31 <NA>
14 User3 5 2016-02-19 22:30:30 <NA>
15 User4 1 2016-02-19 20:59:58 2016-02-19 21:10:43
16 User4 2 2016-02-19 21:11:22 2016-02-19 22:02:45
17 User4 3 2016-02-19 22:05:18 2016-02-19 22:05:37
18 User4 4 2016-02-19 22:05:47 <NA>
This may also be written with dcast from reshape2
dcast(within(samp,seq<-ave(as.numeric(datetime),user,state,FUN=order)),
user+seq~state, value.var="datetime")
We can make use of the order of "left" and "joined", and match when one follows the other for each user.
For this I'm going to use library(data.table)
library(data.table)
setDT(df)
## order the data by user and datetime
df <- df[order(user, datetime)]
## add an 'order' column, which is a sequence from 1 to lenght()
## for each user
df[, order := seq(1:.N), by=user]
## split the left and joins
dt_left <- df[state == "left"]
dt_joined <- df[state == "joined"]
## assuming 'left' is after 'joined', shift the 'order' back for left
dt_left[, order := order - 1]
## join user an dorder (and subsetting relevant columns)
## keeping when there's a 'joined' but not a 'left'
dt <- dt_left[, .(user, order, datetime)][dt_joined[, .(user, order, datetime)], on=c("user", "order"), nomatch=NA]
## rename columns
setnames(dt, c("datetime", "i.datetime"), c("left", "joined"))
user order left joined
1: User1 1 2016-02-19 19:13:32 2016-02-19 19:13:26
2: User1 3 2016-02-19 19:25:26 2016-02-19 19:21:33
3: User1 5 2016-02-19 19:42:16 2016-02-19 19:35:38
4: User1 7 2016-02-19 19:47:59 2016-02-19 19:44:15
5: User1 9 2016-02-19 19:51:06 2016-02-19 19:48:55
6: User1 11 2016-02-19 20:48:22 2016-02-19 19:52:40
7: User1 13 2016-02-19 21:11:13 2016-02-19 21:06:20
8: User1 15 2016-02-19 21:17:33 2016-02-19 21:11:15
9: User2 1 2016-02-19 19:30:30 2016-02-19 19:21:18
10: User3 1 2016-02-19 20:02:26 2016-02-19 19:53:15
11: User3 3 2016-02-19 20:13:38 2016-02-19 20:02:34
12: User3 5 2016-02-19 20:42:27 2016-02-19 20:13:48
13: User3 7 NA 2016-02-19 20:49:31
14: User3 8 NA 2016-02-19 22:30:30
15: User4 1 2016-02-19 21:10:43 2016-02-19 20:59:58
16: User4 3 2016-02-19 22:02:45 2016-02-19 21:11:22
17: User4 5 2016-02-19 22:05:37 2016-02-19 22:05:18
18: User4 7 NA 2016-02-19 22:05:47
Base version:
samp$count <- with(samp, ave(as.character(user),list(state,user),FUN=seq_along) )
out <- merge(
samp[samp$state=="joined",c("user","datetime","count")],
samp[samp$state=="left",c("user","datetime","count")],
by=c("user","count"), all.x=TRUE
)
out[order(out$count),]