Data Table Solution To New Structured Variable

不羁岁月 提交于 2020-03-05 01:32:36

问题


data=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,4,4,4,4),
"score"=c(1,2,1,1,2,3,2,NA,3,NA,1,3,2,1),
"drop"=c(0,0,0,0,0,0,0,1,0,1,0,0,0,0),
"WANT"=c(1,2,1,1,2,3,3,4,3,4,1,3,3,3))

I have dataframe 'data' sans 'WANT' which is what I hope to create using a data.table solution.

The rules are:

if score = 1, WANT = 1 if score = 2, WANT = 2 if score = 3, WANT = 3, if drop = 1, WANT=4

if score at t = 2 and score at t+1 = 1 that is ok but

if score at t = 3 and score at any later scores are less than 3, they are replaced with 3.

that means a score series of: 1-2-1-3-1 should be: 1-2-1-3-3

    data2=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,4,4,4,4,5,5,5,5),
"score"=c(1,2,1,1,2,3,2,NA,3,NA,1,3,2,1,1,3,NA,2),
"drop"=c(0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0),
"WANT"=c(1,2,1,1,2,3,3,4,3,4,1,3,3,3,1,3,3,3))

回答1:


We can use replace after creating a condition based on the occurrence of 3 value in 'score' for each 'student'

library(dplyr)
data %>% 
   group_by(student) %>%
   mutate(WANT2 = replace(if(3 %in% score) replace(score, 
     (match(3, score) +1):n(), 3) else score, is.na(score) & drop == 1, 4))
# A tibble: 14 x 5
# Groups:   student [4]
#   student score  drop  WANT WANT2
#     <dbl> <dbl> <dbl> <dbl> <dbl>
# 1       1     1     0     1     1
# 2       1     2     0     2     2
# 3       1     1     0     1     1
# 4       1     1     0     1     1
# 5       2     2     0     2     2
# 6       2     3     0     3     3
# 7       2     2     0     3     3
# 8       2    NA     1     4     4
# 9       3     3     0     3     3
#10       3    NA     1     4     4
#11       4     1     0     1     1
#12       4     3     0     3     3
#13       4     2     0     3     3
#14       4     1     0     3     3



回答2:


An option using data.table:

library(data.table)

#if score = 1, WANT = 1 if score = 2, WANT = 2 if score = 3, WANT = 3
setDT(data)[, w := score]

#if score at t = 3 and score at any later scores are less than 3, they are replaced with 3.
data[data[, .I[cummax(score)==3L & score < 3L], student]$V1, w := 3L]

#it add student '5' which has NA values that I hope to fill with prior non-missing NA value
data[, w := nafill(w, "locf")]

#if drop = 1, WANT=4
data[drop==1L, w := 4L]


来源:https://stackoverflow.com/questions/60378425/data-table-solution-to-new-structured-variable

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!