R dplyr summarise multiple functions to selected variables

后端未结

关注

 4  1794

死守一世寂寞 2021-01-15 11:51

I have a dataset for which I want to summarise by mean, but also calculate the max to just 1 of the variables.

Let me start with an example of what I would like to a

4条回答

忘掉有多难 (楼主)

2021-01-15 12:23

If you wanted to do something more complex like that, you could write your own version of summarize_at. With this version you supply triplets of column names, functions, and naming rules. For example

Here's a rough start

my_summarise_at<-function (.tbl, ...) 
{
    dots <- list(...)
    stopifnot(length(dots)%%3==0)
    vars <- do.call("append", Map(function(.cols, .funs, .name) {
        cols <- select_colwise_names(.tbl, .cols)
        funs <- as.fun_list(.funs, .env = parent.frame())
        val<-colwise_(.tbl, funs, cols)
        names <- sapply(names(val), function(x) gsub("%", x, .name))
        setNames(val, names)
    }, dots[seq_along(dots)%%3==1], dots[seq_along(dots)%%3==2], dots[seq_along(dots)%%3==0]))
    summarise_(.tbl, .dots = vars)
}
environment(my_summarise_at)<-getNamespace("dplyr")

And you can call it with

iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > 5) %>%
  my_summarise_at("Sepal.Length:Petal.Width", mean, "%_mean", 
      "Petal.Width", max, "%_max")

For the names we just replace the "%" with the default name. The idea is just to dynamically build the summarize_ expression. The summarize_at function is really just a convenience wrapper around that basic function.

0 讨论(0)

查看其它4个回答