Grouping Over All Possible Combinations of Several Variables With dplyr

后端 未结 4 1805
遇见更好的自我
遇见更好的自我 2021-01-02 15:49

Given a situation such as the following

library(dplyr)
myData <- tbl_df(data.frame( var1 = rnorm(100), 
                             var2 = letters[1:3] %         


        
4条回答
  •  旧巷少年郎
    2021-01-02 16:29

    I have created a function based on the answer of @Gregor and the comments that followed:

    library(magrittr)
    myData <- tbl_df(data.frame( var1 = rnorm(100), 
                             var2 = letters[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor(), 
                             var3 = LETTERS[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor(), 
                             var4 = month.abb[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor()))
    

    Function combSummarise

    combSummarise <- function(data, variables=..., summarise=...){
    
    
      # Get all different combinations of selected variables (credit to @Michael)
        myGroups <- lapply(seq_along(variables), function(x) {
        combn(c(variables), x, simplify = FALSE)}) %>%
        unlist(recursive = FALSE)
    
      # Group by selected variables (credit to @konvas)
        df <- eval(parse(text=paste("lapply(myGroups, function(x){
                   dplyr::group_by_(data, .dots=x) %>% 
                   dplyr::summarize_( \"", paste(summarise, collapse="\",\""),"\")})"))) %>% 
              do.call(plyr::rbind.fill,.)
    
        groupNames <- c(myGroups[[length(myGroups)]])
        newNames <- names(df)[!(names(df) %in% groupNames)]
    
        df <- cbind(df[, groupNames], df[, newNames])
        names(df) <- c(groupNames, newNames)
        df
    
    }
    

    Call of combSummarise

    combSummarise (myData, var=c("var2", "var3", "var4"), 
                   summarise=c("length(var1)", "mean(var1)", "max(var1)"))
    

    or

    combSummarise (myData, var=c("var2", "var4"), 
                   summarise=c("length(var1)", "mean(var1)", "max(var1)"))
    

    or

    combSummarise (myData, var=c("var2", "var4"), 
               summarise=c("length(var1)"))
    

    etc

提交回复
热议问题