问题
Given a data.table as follows, id1
is a subject-level ID, id2
is a within-subject repeated-measure ID, X
are data variables of which there are many. I want to balance the data such that every individual has the same number of rows (repeated measures), which is the max(DT[,.N,by=id1][,N])
, but where id1
and id2
are adjusted as necessary, and X
data values are replaced with NA
for these new rows.
The following:
DT = data.table(
id1 = c(1,1,2,2,2,3,3,3,3),
id2 = c(1,2,1,2,3,1,2,3,4),
X1 = letters[1:9],
X2 = LETTERS[1:9]
)
setkey(DT,id1)
Should look like:
DT = data.table(
id1 = c(1,1,1,1,2,2,2,2,3,3,3,3),
id2 = c(1,2,3,4,1,2,3,4,1,2,3,4),
X1 = c(letters[1:2],NA,NA,letters[3:5],NA,letters[6:9]),
X2 = c(LETTERS[1:2],NA,NA,LETTERS[3:5],NA,LETTERS[6:9])
)
How do you go about doing this using data.table
? For-looping to be avoided as this data-set is huge. Is this a job for reshape2
?
回答1:
You may try:
DT2 <- CJ(id1=1:3, id2=1:4)
merge(DT,DT2, by=c('id1', 'id2'), all=TRUE)
回答2:
Here's a slight variation on akrun's answer that's normally used for the problem at hand:
setkey(DT, id1, id2)
DT[CJ(unique(id1), unique(id2))]
来源:https://stackoverflow.com/questions/25812747/balancing-creating-same-number-of-rows-for-each-individual-data