Extracting event types from last 21 day window

∥☆過路亽.° 提交于 2019-12-01 21:19:57

Here's a possible data.table solution. Here I'm creating 2 temporary data sets- one for Sale and one for the rest of activity types and then joining between them by a rolling window of 21 while using by = .EACHI in order to check conditions in each join. Then, I'm joining the result to the original data set.

Convert the date column to Date class and key the data by Name and Date (for the final/rolling join)

library(data.table)
setkey(setDT(df)[, ActivityDate := as.IDate(ActivityDate, "%m/%d/%Y")], Name, ActivityDate)

Create 2 temporary data sets per each activity

Saletemp <- df[ActivityType == "Sale", .(Name, ActivityDate)]
Elsetemp <- df[ActivityType != "Sale", .(Name, ActivityDate, ActivityType)]

Join by a rolling window of 21 to the sales temporary data set while checking conditions

Saletemp[Elsetemp, `:=`(Email21 = as.logical(which(i.ActivityType == "Email")), 
                        Webinar21 = as.logical(which(i.ActivityType == "Webinar"))), 
         roll = -21, by = .EACHI]

Join everything back

df[Saletemp, `:=`(Email21 = i.Email21, Webinar21 = i.Webinar21)]
df
#     Name ActivityType ActivityDate Email21 Webinar21
#  1: John        Email   2014-01-01      NA        NA
#  2: John      Webinar   2014-01-05      NA        NA
#  3: John         Sale   2014-01-20    TRUE      TRUE
#  4: John      Webinar   2014-03-25      NA        NA
#  5: John         Sale   2014-04-01      NA      TRUE
#  6: John         Sale   2014-07-01      NA        NA
#  7:  Tom        Email   2015-01-01      NA        NA
#  8:  Tom      Webinar   2015-01-05      NA        NA
#  9:  Tom         Sale   2015-01-20    TRUE      TRUE
# 10:  Tom      Webinar   2015-03-25      NA        NA
# 11:  Tom         Sale   2015-04-01      NA      TRUE
# 12:  Tom         Sale   2015-07-01      NA        NA

Here is another option with base R:

df is first split according to Name and then, among each subset, for each Sale, it looks if there is an Email (Webinar) within 21 days from the Sale. Finally, the list is unsplit according to Name.
You just have to replace FALSE by no and TRUE by yes afterwards.

df_split <- split(df, df$Name)

df_split <- lapply(df_split, function(tab){
                                i_s <- which(tab[,2]=="Sale")
                                tab$Email21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Email", 3] >= d_s-21)})
                                tab$Webinar21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Webinar", 3] >= d_s-21)})
                                tab
                              })
df_res <- unsplit(df_split, df$Name)

df_res
#   Name ActivityType ActivityDate Email21 Webinar21
#1  John        Email   2014-01-01      NA        NA
#2  John      Webinar   2014-01-05      NA        NA
#3  John         Sale   2014-01-20    TRUE      TRUE
#4  John      Webinar   2014-03-25      NA        NA
#5  John         Sale   2014-04-01   FALSE      TRUE
#6  John         Sale   2014-07-01   FALSE     FALSE
#7   Tom        Email   2015-01-01      NA        NA
#8   Tom      Webinar   2015-01-05      NA        NA
#9   Tom         Sale   2015-01-20    TRUE      TRUE
#10  Tom      Webinar   2015-03-25      NA        NA
#11  Tom         Sale   2015-04-01   FALSE      TRUE
#12  Tom         Sale   2015-07-01   FALSE     FALSE

data

df <- structure(list(Name = c("John", "John", "John", "John", "John", 
"John", "Tom", "Tom", "Tom", "Tom", "Tom", "Tom"), ActivityType = c("Email", 
"Webinar", "Sale", "Webinar", "Sale", "Sale", "Email", "Webinar", 
"Sale", "Webinar", "Sale", "Sale"), ActivityDate = structure(c(16071, 
16075, 16090, 16154, 16161, 16252, 16436, 16440, 16455, 16519, 
16526, 16617), class = "Date")), .Names = c("Name", "ActivityType", 
"ActivityDate"), row.names = c(NA, -12L), index = structure(integer(0), ActivityType = c(1L, 
7L, 3L, 5L, 6L, 9L, 11L, 12L, 2L, 4L, 8L, 10L)), class = "data.frame")
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!