问题
When using dplyr's "group_by" and "mutate", if I understand correctly, the data frame is split in different sub-dataframes according to the group_by argument. For example, with the following code :
set.seed(7)
df <- data.frame(x=runif(10),let=rep(letters[1:5],each=2))
df %>% group_by(let) %>% mutate(mean.by.letter = mean(x))
mean() is applied successively to the column x of 5 sub-dfs corresponding to a letter between a & e.
So you can manipulate the columns of the sub-dfs but can you access the sub-dfs themselves ? To my surprise, if I try :
set.seed(7)
data <- data.frame(x=runif(10),let=rep(letters[1:5],each=2))
data %>% group_by(let) %>% mutate(mean.by.letter = mean(.$x))
the result is different. From this result, one can infer that the "." df doesn't represent successively the sub-dfs but just the "data" one (the group_by function doens't change anything).
The reason is that I want to use a stat function that take a data frame as an arguments on each of this sub-dfs.
Thanks !
回答1:
We can use within do
data %>%
group_by(let ) %>%
do(mutate(., mean.by.letter = mean(.$x)))
回答2:
Since dplyr 0.8 you can use group_map
, the .
in the group_map
call will represent the sub-data.frame .
library(dplyr)
df %>%
group_by(let ) %>%
group_map(~mutate(., mean.by.letter = mean(x)))
#> # A tibble: 10 x 3
#> # Groups: let [5]
#> let x mean.by.letter
#> <fct> <dbl> <dbl>
#> 1 a 0.989 0.693
#> 2 a 0.398 0.693
#> 3 b 0.116 0.0927
#> 4 b 0.0697 0.0927
#> 5 c 0.244 0.518
#> 6 c 0.792 0.518
#> 7 d 0.340 0.656
#> 8 d 0.972 0.656
#> 9 e 0.166 0.312
#> 10 e 0.459 0.312
Find more about group_map
and other new features there:
https://www.tidyverse.org/articles/2019/02/dplyr-0-8-0/ https://www.tidyverse.org/articles/2018/12/dplyr-0-8-0-release-candidate/
来源:https://stackoverflow.com/questions/36551708/dplyr-group-by-and-mutate-how-to-access-the-data-frame