use dplyr's summarise_each to return one row per function?

后端 未结 3 813
[愿得一人]
[愿得一人] 2020-12-13 04:43

I\'m using dplyr\'s summarise_each to apply a function to multiple columns of data. One thing that\'s nice is that you can apply multiple functions at once. Thing is, it\'

相关标签:
3条回答
  • 2020-12-13 05:34

    One option is to use purrr::map_df (really map_dfc to simplify back to a data.frame with bind_cols though map_df is fine for now) with a function that makes a vector of results of each function, i.e.

    library(tidyverse)
    
    iris %>% select(contains('Petal')) %>% 
        map_dfc(~c(min(.x), max(.x))) %>% 
        mutate(stat = c('min', 'max'))    # to add column of function names
    
    #> # A tibble: 2 × 3
    #>   Petal.Length Petal.Width  stat
    #>          <dbl>       <dbl> <chr>
    #> 1          1.0         0.1   min
    #> 2          6.9         2.5   max
    
    0 讨论(0)
  • 2020-12-13 05:37

    To my knowledge there's no such argument. Anyhow, here's a workaround that outputs tidy data, I think that would be even better than having as many rows as functions and as many columns as summarised columns. (note that add_rownames requires dplyr 0.4.0)

    library("dplyr")
    library("tidyr")
    
    iris %>% 
      summarise_each(funs(min, max, mean, median), matches("Petal")) %>%
      t %>% 
      as.data.frame %>% 
      add_rownames %>%
      separate(rowname, into = c("feature", "fun"), sep = "_")
    

    returns:

           feature    fun       V1
    1 Petal.Length    min 1.000000
    2  Petal.Width    min 0.100000
    3 Petal.Length    max 6.900000
    4  Petal.Width    max 2.500000
    5 Petal.Length   mean 3.758000
    6  Petal.Width   mean 1.199333
    7 Petal.Length median 4.350000
    8  Petal.Width median 1.300000
    
    0 讨论(0)
  • 2020-12-13 05:42

    You can achieve a similar output combining the dplyr and tidyr packages. Something along these lines can help

    library(dplyr)
    library(tidyr)
    
    iris %>%
      select(matches("Petal")) %>%
      summarise_each(funs(min, max)) %>%
      gather(variable, value) %>%
      separate(variable, c("var", "stat"), sep = "\\_") %>%
      spread(var, value)
    ##   stat Petal.Length Petal.Width
    ## 1  max          6.9         2.5
    ## 2  min          1.0         0.1
    
    0 讨论(0)
提交回复
热议问题