问题
Supposed we have,
library(data.table)
dt <- data.table(id = 1:4, x1 = 10:13, x2=21:24, wt=c(1,0,0.5,0.7))
return,
id x1 x2 wt
1: 1 10 21 1.0
2: 2 11 22 0.0
3: 3 12 23 0.5
4: 4 13 24 0.7
I would like to replicate observations under the following conditions:
- If
wt
is 0 or 1, we assignflag
equal to 1 and 0, respectively - If 0 <
wt
< 1, we assignflag
equal to 0. Further, we replicate this observation withwt = 1-wt
and assignflag
equal to 1.
The return that I expect will be
id x1 x2 wt flag
1: 1 10 21 1.0 0
2: 2 11 22 0.0 1
3: 3 12 23 0.5 0
4: 3 12 23 0.5 1
5: 4 13 24 0.7 0
6: 4 13 24 0.3 1
I have tried with my code
dt[,flag:=ifelse(wt==1,0, ifelse(wt==0, 1, 0))]
dt[,freq:=ifelse(wt > 0 & wt < 1, 2, 1)]
dtr <- dt[rep(1:.N, freq)][,Indx:=1:.N, by = id]
dtr[freq==2&Indx==2, wt:=1-wt]
dtr[Indx==2,flag:=1]
dtr[,`:=`(freq=NULL, Indx=NULL)]
But, I think it is not efficient.
Do you have any suggestions?
回答1:
We can change some of the steps to make it more compact i.e. remove the ifelse
and use the assignment directly by converting a logical to binary, replicate the rows without creating a column, then get the index ('i1') to assign the values in 'flag' and 'wt'.
dt1 <- dt[, flag := +(wt == 0)][rep(1:.N, (wt > 0 & wt < 1) +1)][]
i1 <- dt1[, .I[seq_len(.N)==2], id]$V1
dt1[i1, c('flag', 'wt') := .(1, 1-wt)][]
# id x1 x2 wt flag
#1: 1 10 21 1.0 0
#2: 2 11 22 0.0 1
#3: 3 12 23 0.5 0
#4: 3 12 23 0.5 1
#5: 4 13 24 0.7 0
#6: 4 13 24 0.3 1
回答2:
Here is a way using data frames:
dt <- data.frame(id = 1:4, x1 = 10:13, x2=21:24, wt=c(1,0,0.5,0.7))
# create the flag column
dt$flag = 1 - ceiling(dt$wt)
#create a new data frame with the rows that fulfill condition 2
dt2 = dt[dt$wt < 1 && dt$wt > 0, ]
dt2$wt = 1 - dt2$wt
dt2$flag = 1
#rbind it to the original data frame and reorder by id
dt = rbind(dt,dt2)
dt = dt[order(dt$id),]
Result:
id x1 x2 wt flag
1 1 10 21 1.0 0
2 2 11 22 0.0 1
3 3 12 23 0.5 0
31 3 12 23 0.5 1
4 4 13 24 0.7 0
41 4 13 24 0.3 1
回答3:
The tidyverse
way:
dt2 <- dt %>%
mutate( flag = if_else(wt == 0, 1, 0, missing = NULL)) %>%
mutate( flag = if_else(wt == 1, 0, flag, missing = NULL)) %>%
mutate( flag2 = if_else(wt %in% c(1,0), 1, 2, missing = NULL)) %>%
slice(rep(1:n(), flag2)) %>%
group_by(id) %>%
mutate( wt = if_else( row_number() == 1, 1-wt, wt, missing = NULL)) %>%
mutate( flag = if_else( row_number() == 1, 1, flag, missing = NULL)) %>%
select(id, x1, x2, wt, flag)
this gives
#Source: local data frame [6 x 5]
#Groups: id [4]
#
# id x1 x2 wt flag
# <int> <int> <int> <dbl> <dbl>
#1 1 10 21 0.0 1
#2 2 11 22 1.0 1
#3 3 12 23 0.5 1
#4 3 12 23 0.5 0
#5 4 13 24 0.3 1
#6 4 13 24 0.7 0
p.s. I don't think it matters if we mutate the first or last rows within the groups, so I went with row_number() == 1
来源:https://stackoverflow.com/questions/40902512/how-to-replicate-observations-based-on-weight