Handling missing combinations of factors in R

后端 未结 3 1450
离开以前
离开以前 2020-12-11 02:54

So, I have a data frame with two factors and one numeric variable like so:

>D
f1 f2 v1 
1   A  23
2   A  45
2   B  27
     .
     .
     .
3条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-11 03:22

    Using your data:

    dat <- data.frame(f1 = factor(c(1,2,2)), f2 = factor(c("A","A","B")),
                      v1 = c(23,45,27))
    

    one option is to create a lookup table with the combinations of levels, which is done using the expand.grid() function supplied with the levels of both factors, as shown below:

    dat2 <- with(dat, expand.grid(f1 = levels(f1), f2 = levels(f2)))
    

    A database-like join operation can then be performed using the merge() function in which we specify that all values from the lookup table are included in the join (all.y = TRUE)

    newdat <- merge(dat, dat2, all.y = TRUE)
    

    The above line produces:

    > newdat
      f1 f2 v1
    1  1  A 23
    2  1  B NA
    3  2  A 45
    4  2  B 27
    

    As you can see, the missing combinations are given the value NA indicating the missing-ness. It is realtively simple to then replace these NAs with 0s:

    > newdat$v1[is.na(newdat$v1)] <- 0
    > newdat
      f1 f2 v1
    1  1  A 23
    2  1  B  0
    3  2  A 45
    4  2  B 27
    

提交回复
热议问题