问题
I am using the reshape2 function dcast on a dataframe. One of the variables is a factor where some of the levels do not appear in the dataframe, but I would to include all values in the new columns created.
For example say I run the following
library(reshape2)
dataDF <- data.frame(
id = 1:6,
id2 = c(1,2,3,1,2,3),
x = c(rep('t1', 3), rep('t2', 3)),
y = factor(c('A', 'B', 'A', 'B', 'B', 'C'), levels = c('A', 'B', 'C', 'D')),
value = rep(1)
)
dcast(dataDF, id + id2 ~ x + y, fill = 0)
I get the following
id id2 t1_A t1_B t2_B t2_C
1 1 1 1 0 0 0
2 2 2 0 1 0 0
3 3 3 1 0 0 0
4 4 1 0 0 1 0
5 5 2 0 0 1 0
6 6 3 0 0 0 1
But I also want to include the columns t1_C, t1_D, t2_A and t2_D full of 0's
i.e. I want the following
id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1 1 1 1 0 0 0 0 0 0 0
2 2 2 0 1 0 0 0 0 0 0
3 3 3 1 0 0 0 0 0 0 0
4 4 1 0 0 0 0 0 1 0 0
5 5 2 0 0 0 0 0 1 0 0
6 6 3 0 0 0 0 0 0 1 0
Also, as an aisde, would it be possible to create the above without having the column 'value' full of ones in the initial dataframe. Basically just want to cast x & y in their own columns with a 1 if they exist in that id.
Thanks in advance
EDIT: Initially had one variable on LHS which Jeremy answer below, but actual have more than one variable on LHS so edited question to reflect this
回答1:
Try adding drop = FALSE to your dcast call, so that unused factor levels are not dropped:
dcast(dataDF, id ~ x + y, fill = 0, drop = FALSE)
id t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1 1 1 0 0 0 0 0 0 0
2 2 0 1 0 0 0 0 0 0
3 3 1 0 0 0 0 0 0 0
4 4 0 0 0 0 0 1 0 0
5 5 0 0 0 0 0 1 0 0
6 6 0 0 0 0 0 0 1 0
For your aside, yes, we just need to tell dcast what you want using a function to aggregate, in this case you want length:
data2 <- dataDF[,1:3]
dcast(data2, id ~ x + y, fill = 0, drop = FALSE, fun.aggregate = length)
For your edit, I'd use tidyr and dplyr rather than reshape2:
library(tidyr)
library(dplyr)
dataDF %>% left_join(expand.grid(x = levels(dataDF$x), y = levels(dataDF$y)), .) %>%
unite(z, x, y) %>%
spread(z, value, fill = 0) %>%
na.omit
First we complete all combination of x and y using expand.grid and merging, then we unite them into one column, z, then we spread them out, then remove the NAs from the id columns:
id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1 1 1 1 0 0 0 0 0 0 0
2 2 2 0 1 0 0 0 0 0 0
3 3 3 1 0 0 0 0 0 0 0
4 4 1 0 0 0 0 0 1 0 0
5 5 2 0 0 0 0 0 1 0 0
6 6 3 0 0 0 0 0 0 1 0
来源:https://stackoverflow.com/questions/33040961/r-include-factors-with-no-entries-when-using-dcast