So, I have a data frame with two factors and one numeric variable like so:
>D
f1 f2 v1
1 A 23
2 A 45
2 B 27
.
.
.
Using your data:
dat <- data.frame(f1 = factor(c(1,2,2)), f2 = factor(c("A","A","B")),
v1 = c(23,45,27))
one option is to create a lookup table with the combinations of levels, which is done using the expand.grid()
function supplied with the levels of both factors, as shown below:
dat2 <- with(dat, expand.grid(f1 = levels(f1), f2 = levels(f2)))
A database-like join operation can then be performed using the merge()
function in which we specify that all values from the lookup table are included in the join (all.y = TRUE
)
newdat <- merge(dat, dat2, all.y = TRUE)
The above line produces:
> newdat
f1 f2 v1
1 1 A 23
2 1 B NA
3 2 A 45
4 2 B 27
As you can see, the missing combinations are given the value NA
indicating the missing-ness. It is realtively simple to then replace these NA
s with 0
s:
> newdat$v1[is.na(newdat$v1)] <- 0
> newdat
f1 f2 v1
1 1 A 23
2 1 B 0
3 2 A 45
4 2 B 27
I add the tidyr
solution, spreading with fill=0
and gathering.
library(tidyr)
df %>% spread(f2, v1, fill=0) %>% gather(f2, v1, -f1)
# f1 f2 v1
#1 1 A 23
#2 2 A 45
#3 1 B 0
#4 2 B 27
You could equally do df %>% spread(f1, v1, fill=0) %>% gather(f1, v1, -f2)
.
Two years late, but I had the same problem and came up with this plyr
solution:
dat <- data.frame(f1 = factor(c(1,2,2)), f2 = factor(c("A","A","B")), v1 = c(23,45,27))
newdat <- ddply(dat, .(f1,f2), numcolwise(function(x) {if(length(x)>0) x else 0.0}), .drop=F)
> newdat
f1 f2 v1
1 1 A 23
2 1 B 0
3 2 A 45
4 2 B 27