When using stats:::reshape()
from base to convert data from long to wide format, for any variables designated as time invariant, reshape
just takes
I don't know how to fix the problem but one way to fix the symptom would be to push the NA values down in the order.
testdata <- testdata[order(testdata$timeinvariant),]
testdata
# id process1 timeinvariant time
#3 2 3.5 4 1
#2 1 4.0 6 2
#1 1 3.0 NA 1
reshaped<-reshape(testdata,v.names="process1",direction="wide")
reshaped
# id timeinvariant process1.1 process1.2
#3 2 4 3.5 NA
#2 1 6 3.0 4
A more general solution will be to make sure there is only one value in the timevariant column per id
testdata$timeinvariant <- apply(testdata,1,function(x) max(testdata[testdata$id == x[1],"timeinvariant"],na.rm=T))
testdata
# id process1 timeinvariant time
#3 2 3.5 4 1
#2 1 4.0 6 2
#1 1 3.0 6 1
This can be repeated for any number of columns before calling the reshape function. Hope this helps
If there's always at least one non-missing value of timeinvariant
for each id
, and all (non-missing) values of timeinvariant
are identical for each id
(since it's time-invariant), couldn't you create a new column that fills in the NA
values in timeinvariant
and then reshape using that column? For example:
# Add another row to your data frame so that we'll have 2 NA values to deal with
td <- data.frame(matrix(c(1,1,2,1,3,4,3.5,4.5,NA,6,4,NA,1,2,1,3), nrow = 4))
colnames(td) <- c("id", "process1", "timeinvariant", "time")
# Create new column timeinvariant2, which fills in NAs from timeinvariant,
# then reshape using that column
library(dplyr)
td.wide = td %>%
group_by(id) %>%
mutate(timeinvariant2=max(timeinvariant, na.rm=TRUE)) %>%
dcast(id + timeinvariant2 ~ time, value.var='process1')
# Paste "process1." onto names of all "time" columns
names(td.wide) = gsub("(^[0-9]*$)", "process1\\.\\1", names(td.wide) )
td.wide
id timeinvariant2 process1.1 process1.2 process1.3
1 1 6 3.0 4 4.5
2 2 4 3.5 NA NA