dplyr group_by and mutate, how to access the data frame?

问题

When using dplyr's "group_by" and "mutate", if I understand correctly, the data frame is split in different sub-dataframes according to the group_by argument. For example, with the following code :

 set.seed(7)
 df <- data.frame(x=runif(10),let=rep(letters[1:5],each=2))
 df %>% group_by(let) %>% mutate(mean.by.letter = mean(x))

mean() is applied successively to the column x of 5 sub-dfs corresponding to a letter between a & e.

So you can manipulate the columns of the sub-dfs but can you access the sub-dfs themselves ? To my surprise, if I try :

 set.seed(7)
 data <- data.frame(x=runif(10),let=rep(letters[1:5],each=2))
 data %>% group_by(let) %>% mutate(mean.by.letter = mean(.$x))

the result is different. From this result, one can infer that the "." df doesn't represent successively the sub-dfs but just the "data" one (the group_by function doens't change anything).
The reason is that I want to use a stat function that take a data frame as an arguments on each of this sub-dfs. Thanks !

回答1:

We can use within do

data %>%
    group_by(let ) %>% 
    do(mutate(., mean.by.letter = mean(.$x)))

回答2:

Since dplyr 0.8 you can use group_map, the . in the group_map call will represent the sub-data.frame .

library(dplyr)
df %>%
  group_by(let ) %>% 
  group_map(~mutate(., mean.by.letter = mean(x)))
#> # A tibble: 10 x 3
#> # Groups:   let [5]
#>    let        x mean.by.letter
#>    <fct>  <dbl>          <dbl>
#>  1 a     0.989          0.693 
#>  2 a     0.398          0.693 
#>  3 b     0.116          0.0927
#>  4 b     0.0697         0.0927
#>  5 c     0.244          0.518 
#>  6 c     0.792          0.518 
#>  7 d     0.340          0.656 
#>  8 d     0.972          0.656 
#>  9 e     0.166          0.312 
#> 10 e     0.459          0.312

Find more about group_map and other new features there:

https://www.tidyverse.org/articles/2019/02/dplyr-0-8-0/ https://www.tidyverse.org/articles/2018/12/dplyr-0-8-0-release-candidate/

来源：https://stackoverflow.com/questions/36551708/dplyr-group-by-and-mutate-how-to-access-the-data-frame

标签

group-by

dplyr