发表新帖

发表新帖

Handling missing combinations of factors in R

后端未结

关注

 3  1450

离开以前 2020-12-11 02:54

So, I have a data frame with two factors and one numeric variable like so:

3条回答

暗喜 (楼主)

2020-12-11 03:22
Using your data:
```
dat <- data.frame(f1 = factor(c(1,2,2)), f2 = factor(c("A","A","B")),
                  v1 = c(23,45,27))
```
one option is to create a lookup table with the combinations of levels, which is done using the expand.grid() function supplied with the levels of both factors, as shown below:
```
dat2 <- with(dat, expand.grid(f1 = levels(f1), f2 = levels(f2)))
```
A database-like join operation can then be performed using the merge() function in which we specify that all values from the lookup table are included in the join (all.y = TRUE)
```
newdat <- merge(dat, dat2, all.y = TRUE)
```
The above line produces:
```
> newdat
  f1 f2 v1
1  1  A 23
2  1  B NA
3  2  A 45
4  2  B 27
```
As you can see, the missing combinations are given the value NA indicating the missing-ness. It is realtively simple to then replace these NAs with 0s:
```
> newdat$v1[is.na(newdat$v1)] <- 0
> newdat
  f1 f2 v1
1  1  A 23
2  1  B  0
3  2  A 45
4  2  B 27
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题