问题
I'm trying to calculate a running count (i.e., cumulative sum) that is conditional on other variables and that can reset for particular values on another variable. I'm working in R and would prefer a dplyr
-based solution, if possible.
I'd like to create a variable for the running count, cumulative
, based on the following algorithm:
- Calculate the running count (
cumulative
) within combinations ofid
andage
- Increment running count (
cumulative
) by 1 for every subsequenttrial
whereaccuracy = 0
,block = 2
, andcondition = 1
- Reset running count (
cumulative
) to 0 for eachtrial
whereaccuracy = 1
,block = 2
, andcondition = 1
, and the next increment resumes at 1 (not the previous number) - For each
trial
whereblock != 2
, orcondition != 1
, leave the running count (cumulative
) asNA
Here's a minimal working example:
mydata <- data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1),
age = c(1,1,1,1,1,1,1,1,1,1,2),
block = c(1,1,2,2,2,2,2,2,2,2,2),
trial = c(1,2,1,2,3,4,5,6,7,8,1),
condition = c(1,1,1,1,1,2,1,1,1,1,1),
accuracy = c(0,0,0,0,0,0,0,1,0,0,0)
)
id age block trial condition accuracy
1 1 1 1 1 0
1 1 1 2 1 0
1 1 2 1 1 0
1 1 2 2 1 0
1 1 2 3 1 0
1 1 2 4 2 0
1 1 2 5 1 0
1 1 2 6 1 1
1 1 2 7 1 0
1 1 2 8 1 0
1 2 2 1 1 0
The expected output is:
id age block trial condition accuracy cumulative
1 1 1 1 1 0 NA
1 1 1 2 1 0 NA
1 1 2 1 1 0 1
1 1 2 2 1 0 2
1 1 2 3 1 0 3
1 1 2 4 2 0 NA
1 1 2 5 1 0 4
1 1 2 6 1 1 0
1 1 2 7 1 0 1
1 1 2 8 1 0 2
1 2 2 1 1 0 1
回答1:
We can use case_when
to assign the value which we need based on our conditions. We then add an additional group_by
condition using cumsum
to switch values when the temp
column 0. In the final mutate
step we temporarily replace
NA
values in temp
to 0, then take cumsum
over it and put back the NA
values again to it's place to get the final output.
library(dplyr)
mydata %>%
group_by(id, age) %>%
mutate(temp = case_when(accuracy == 0 & block == 2 & condition == 1 ~ 1,
accuracy == 1 & block == 2 & condition == 1 ~ 0,
TRUE ~ NA_real_)) %>%
ungroup() %>%
group_by(id, age, group = cumsum(replace(temp == 0, is.na(temp), 0))) %>%
mutate(cumulative = replace(cumsum(replace(temp, is.na(temp), 0)),
is.na(temp), NA)) %>%
select(-temp, -group)
# group id age block trial condition accuracy cumulative
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0 1 1 1 1 1 0 NA
# 2 0 1 1 1 2 1 0 NA
# 3 0 1 1 2 1 1 0 1
# 4 0 1 1 2 2 1 0 2
# 5 0 1 1 2 3 1 0 3
# 6 0 1 1 2 4 2 0 NA
# 7 0 1 1 2 5 1 0 4
# 8 1 1 1 2 6 1 1 0
# 9 1 1 1 2 7 1 0 1
#10 1 1 1 2 8 1 0 2
#11 1 1 2 2 1 1 0 1
回答2:
Here is an option using data.table
. Create a binary column based on match
ing the paste
d values of 'accuracy', 'block', 'condition' with that of the custom values, grouped by run-length-id of the binary column ('ind'), 'id' and 'age', get the cumulative sum of 'ind' and assign (:=
) it to a new column ('Cumulative')
library(data.table)
setDT(mydata)[, ind := match(do.call(paste0, .SD), c("121", "021")) - 1,
.SDcols = c("accuracy", "block", "condition")
][, Cumulative := cumsum(ind), .(rleid(ind), id, age)
][, ind := NULL][]
# id age block trial condition accuracy Cumulative
# 1: 1 1 1 1 1 0 NA
# 2: 1 1 1 2 1 0 NA
# 3: 1 1 2 1 1 0 1
# 4: 1 1 2 2 1 0 2
# 5: 1 1 2 3 1 0 3
# 6: 1 1 2 4 2 0 NA
# 7: 1 1 2 5 1 1 0
# 8: 1 1 2 6 1 0 1
# 9: 1 1 2 7 1 0 2
#10: 1 2 2 1 1 0 1
来源:https://stackoverflow.com/questions/52960348/conditional-running-count-cumulative-sum-with-reset-in-r-dplyr