Handling missing combinations of factors in R

后端 未结 3 1436
离开以前
离开以前 2020-12-11 02:54

So, I have a data frame with two factors and one numeric variable like so:

>D
f1 f2 v1 
1   A  23
2   A  45
2   B  27
     .
     .
     .
相关标签:
3条回答
  • 2020-12-11 03:22

    Using your data:

    dat <- data.frame(f1 = factor(c(1,2,2)), f2 = factor(c("A","A","B")),
                      v1 = c(23,45,27))
    

    one option is to create a lookup table with the combinations of levels, which is done using the expand.grid() function supplied with the levels of both factors, as shown below:

    dat2 <- with(dat, expand.grid(f1 = levels(f1), f2 = levels(f2)))
    

    A database-like join operation can then be performed using the merge() function in which we specify that all values from the lookup table are included in the join (all.y = TRUE)

    newdat <- merge(dat, dat2, all.y = TRUE)
    

    The above line produces:

    > newdat
      f1 f2 v1
    1  1  A 23
    2  1  B NA
    3  2  A 45
    4  2  B 27
    

    As you can see, the missing combinations are given the value NA indicating the missing-ness. It is realtively simple to then replace these NAs with 0s:

    > newdat$v1[is.na(newdat$v1)] <- 0
    > newdat
      f1 f2 v1
    1  1  A 23
    2  1  B  0
    3  2  A 45
    4  2  B 27
    
    0 讨论(0)
  • 2020-12-11 03:23

    I add the tidyr solution, spreading with fill=0 and gathering.

    library(tidyr)
    df %>% spread(f2, v1, fill=0) %>% gather(f2, v1, -f1)
    
    #  f1 f2 v1
    #1  1  A 23
    #2  2  A 45
    #3  1  B  0
    #4  2  B 27
    

    You could equally do df %>% spread(f1, v1, fill=0) %>% gather(f1, v1, -f2).

    0 讨论(0)
  • 2020-12-11 03:33

    Two years late, but I had the same problem and came up with this plyr solution:

    dat <- data.frame(f1 = factor(c(1,2,2)), f2 = factor(c("A","A","B")), v1 = c(23,45,27))
    
    newdat <- ddply(dat, .(f1,f2), numcolwise(function(x) {if(length(x)>0) x else 0.0}), .drop=F)
    
    > newdat
      f1 f2 v1
    1  1  A 23
    2  1  B  0
    3  2  A 45
    4  2  B 27
    
    0 讨论(0)
提交回复
热议问题