R dplyr summarise multiple functions to selected variables

后端 未结 4 1794
死守一世寂寞
死守一世寂寞 2021-01-15 11:51

I have a dataset for which I want to summarise by mean, but also calculate the max to just 1 of the variables.

Let me start with an example of what I would like to a

4条回答
  •  忘掉有多难
    2021-01-15 12:23

    If you wanted to do something more complex like that, you could write your own version of summarize_at. With this version you supply triplets of column names, functions, and naming rules. For example

    Here's a rough start

    my_summarise_at<-function (.tbl, ...) 
    {
        dots <- list(...)
        stopifnot(length(dots)%%3==0)
        vars <- do.call("append", Map(function(.cols, .funs, .name) {
            cols <- select_colwise_names(.tbl, .cols)
            funs <- as.fun_list(.funs, .env = parent.frame())
            val<-colwise_(.tbl, funs, cols)
            names <- sapply(names(val), function(x) gsub("%", x, .name))
            setNames(val, names)
        }, dots[seq_along(dots)%%3==1], dots[seq_along(dots)%%3==2], dots[seq_along(dots)%%3==0]))
        summarise_(.tbl, .dots = vars)
    }
    environment(my_summarise_at)<-getNamespace("dplyr")
    

    And you can call it with

    iris %>%
      group_by(Species) %>%
      filter(Sepal.Length > 5) %>%
      my_summarise_at("Sepal.Length:Petal.Width", mean, "%_mean", 
          "Petal.Width", max, "%_max")
    

    For the names we just replace the "%" with the default name. The idea is just to dynamically build the summarize_ expression. The summarize_at function is really just a convenience wrapper around that basic function.

提交回复
热议问题