group_by() into fill() not working as expected

喜你入骨 提交于 2019-12-30 03:11:48

问题


I'm trying to do a Last Observation Carried Forward operation on some poorly formatted data using dplyr and tidyr. It isn't working as I'd expect.

library(dplyr)
library(tidyr)

df <- data.frame(id=c(1,1,2,2,3,3),
                 email=c('bob@email.com', NA, 'joe@email.com', NA, NA, NA))
df2 <- df %>% group_by(id) %>% fill(email)

This results in:

Source: local data frame [6 x 2]
Groups: id [3]

     id         email
  (dbl)        (fctr)
1     1 bob@email.com
2     1 bob@email.com
3     2 joe@email.com
4     2 joe@email.com
5     3 joe@email.com
6     3 joe@email.com

I expect it to be:

Source: local data frame [6 x 2]
Groups: id [3]

     id         email
  (dbl)        (fctr)
1     1 bob@email.com
2     1 bob@email.com
3     2 joe@email.com
4     2 joe@email.com
5     3 NA
6     3 NA

The reason I expect it to be the latter is because of group_by's documentation saying, "The group_by function takes an existing tbl and converts it into a grouped tbl where operations are performed "by group"." The group in this case is determined by the id variable, and the following operation is fill(email). However, it's pretty clearly NOT doing that.


And before anybody asks, it makes no difference if the fields are both character instead of numeric or factor.


UPDATE @aosmith pointed out this open issue on Github. I'm going to say that there won't be a proper solution to this problem until that issue is resolved. Everything else would just be a workaround. So, if somebody makes a successful PR addressing that issue and posts it here, I'd be happy to mark it as the solution.


回答1:


Looks like this has been fixed in the development version of tidyr. You now get the expected result per id using fill from tidyr_0.3.1.9000.

df %>% group_by(id) %>% fill(email)

Source: local data frame [6 x 2]
Groups: id [3]

     id         email
  (dbl)        (fctr)
1     1 bob@email.com
2     1 bob@email.com
3     2 joe@email.com
4     2 joe@email.com
5     3            NA
6     3            NA



回答2:


Luckily you can still use zoo::na.locf for this:

df %>% 
    group_by(id) %>% 
    mutate(email = zoo::na.locf(email, na.rm = FALSE))  
# Source: local data frame [6 x 2]
# Groups: id [3]
# 
#      id         email
#   (dbl)        (fctr)
# 1     1 bob@email.com
# 2     1 bob@email.com
# 3     2 joe@email.com
# 4     2 joe@email.com
# 5     3            NA
# 6     3            NA



回答3:


Another option is to use do from dplyr:

df3 <- df %>% group_by(id) %>% do(fill(.,email))



回答4:


Two questions, does it has be duplicated and do you have to use dplyr and tidyr?

Maybe this could be a solution?

(
bar <- data.frame(id=c(1,1,2,2,3,3),
                 email=c('bob@email.com', NA, 'joe@email.com', NA, NA, NA))
)                 
#> id         email
#>  1 bob@email.com
#>  1          <NA>
#>  2 joe@email.com
#>  2          <NA>
#>  3          <NA>
#>  3          <NA>

(                 
foo <- bar[!duplicated(bar$id),]
)
#> id         email
#>  1 bob@email.com
#>  2 joe@email.com
#>  3          <NA>



回答5:


This is kind of ugly, but it is another option that uses dplyr and works with your sample data

df %>%
   group_by(id) %>%
   mutate(email = email[ !is.na(email) ][1])


来源:https://stackoverflow.com/questions/34517370/group-by-into-fill-not-working-as-expected

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!