summarise_at using different functions for different variables

前端 未结 2 1595
轮回少年
轮回少年 2020-12-09 05:52

When I use group_by and summarise in dplyr, I can naturally apply different summary functions to different variables. For instance:

    library(tidyverse)

          


        
相关标签:
2条回答
  • 2020-12-09 06:42

    Here is one idea.

    library(tidyverse)
    
    df_mean <- df %>%
      group_by(category) %>%
      summarize_at(vars(x), funs(mean(.)))
    
    df_median <- df %>%
      group_by(category) %>%
      summarize_at(vars(y), funs(median(.)))
    
    df_first <- df %>%
      group_by(category) %>%
      summarize_at(vars(z), funs(first(.)))
    
    df_summary <- reduce(list(df_mean, df_median, df_first), 
                         left_join, by = "category")
    

    Like you said, there is no need to use summarise_at for this example. However, if you have a lot of columns need to be summarized by different functions, this strategy may work. You will need to specify the columns in the vars(...) for each summarize_at. The rule is the same as the dplyr::select function.

    Update

    Here is another idea. Define a function which modifies the summarise_at function, and then use map2 to apply this function with a look-up list showing variables and associated functions to apply. In this example, I applied mean to x and y column and median to z.

    # Define a function
    summarise_at_fun <- function(variable, func, data){
      data2 <- data %>%
        summarise_at(vars(variable), funs(get(func)(.)))
      return(data2)
    }
    
    # Group the data
    df2 <- df %>% group_by(category)
    
    # Create a look-up list with function names and variable to apply
    look_list <- list(mean = c("x", "y"),
                      median = "z")
    
    # Apply the summarise_at_fun
    map2(look_list, names(look_list), summarise_at_fun, data = df2) %>%
      reduce(left_join, by = "category")
    
    # A tibble: 3 x 4
      category     x     y     z
         <chr> <dbl> <dbl> <dbl>
    1        a     6     6     0
    2        b     5     3     8
    3        c     2     6     1
    
    0 讨论(0)
  • 2020-12-09 06:44

    Since your question is about "summarise_at";

    Here is what my idea is:

    df %>% group_by(category) %>% 
     summarise_at(vars(x, y, z),
          funs(mean = mean, sd = sd, min = min),
          na.rm = TRUE)
    
    0 讨论(0)
提交回复
热议问题