问题
data=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,4,4,4,4),
"score"=c(1,2,1,1,2,3,2,NA,3,NA,1,3,2,1),
"drop"=c(0,0,0,0,0,0,0,1,0,1,0,0,0,0),
"WANT"=c(1,2,1,1,2,3,3,4,3,4,1,3,3,3))
I have dataframe 'data' sans 'WANT' which is what I hope to create using a data.table solution.
The rules are:
if score = 1, WANT = 1 if score = 2, WANT = 2 if score = 3, WANT = 3, if drop = 1, WANT=4
if score at t = 2 and score at t+1 = 1 that is ok but
if score at t = 3 and score at any later scores are less than 3, they are replaced with 3.
that means a score series of: 1-2-1-3-1 should be: 1-2-1-3-3
data2=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,4,4,4,4,5,5,5,5),
"score"=c(1,2,1,1,2,3,2,NA,3,NA,1,3,2,1,1,3,NA,2),
"drop"=c(0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0),
"WANT"=c(1,2,1,1,2,3,3,4,3,4,1,3,3,3,1,3,3,3))
回答1:
We can use replace
after creating a condition based on the occurrence of 3 value in 'score' for each 'student'
library(dplyr)
data %>%
group_by(student) %>%
mutate(WANT2 = replace(if(3 %in% score) replace(score,
(match(3, score) +1):n(), 3) else score, is.na(score) & drop == 1, 4))
# A tibble: 14 x 5
# Groups: student [4]
# student score drop WANT WANT2
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0 1 1
# 2 1 2 0 2 2
# 3 1 1 0 1 1
# 4 1 1 0 1 1
# 5 2 2 0 2 2
# 6 2 3 0 3 3
# 7 2 2 0 3 3
# 8 2 NA 1 4 4
# 9 3 3 0 3 3
#10 3 NA 1 4 4
#11 4 1 0 1 1
#12 4 3 0 3 3
#13 4 2 0 3 3
#14 4 1 0 3 3
回答2:
An option using data.table
:
library(data.table)
#if score = 1, WANT = 1 if score = 2, WANT = 2 if score = 3, WANT = 3
setDT(data)[, w := score]
#if score at t = 3 and score at any later scores are less than 3, they are replaced with 3.
data[data[, .I[cummax(score)==3L & score < 3L], student]$V1, w := 3L]
#it add student '5' which has NA values that I hope to fill with prior non-missing NA value
data[, w := nafill(w, "locf")]
#if drop = 1, WANT=4
data[drop==1L, w := 4L]
来源:https://stackoverflow.com/questions/60378425/data-table-solution-to-new-structured-variable