Add rows to grouped data with dplyr?

后端 未结 4 956
失恋的感觉
失恋的感觉 2020-12-15 07:32

My data is in a data.frame format like this sample data:

data <- 
structure(list(Article = structure(c(1L, 1L, 3L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L         


        
相关标签:
4条回答
  • 2020-12-15 08:24

    Since dplyr is under active development, I thought I would post an update that also incorporates tidyr:

    library(dplyr)
    library(tidyr)
    
    data %>%
      expand(Article, Week) %>%
      left_join(data) %>%
      group_by(Article, Week) %>%
      summarise(WeekDemand = sum(Demand, na.rm=TRUE))
    

    Which produces:

       Article     Week WeekDemand
    1    10004 2013-W01       1215
    2    10004 2013-W02        900
    3    10004 2013-W03        774
    4    10004 2013-W04       1170
    5    10006 2013-W01          0
    6    10006 2013-W02          0
    7    10006 2013-W03          0
    8    10006 2013-W04          5
    9    10007 2013-W01          2
    10   10007 2013-W02          0
    11   10007 2013-W03          0
    12   10007 2013-W04          0
    

    Using tidyr >= 0.3.1 this can now be written as:

    data %>% 
      complete(Article, Week) %>%  
      group_by(Article, Week) %>% 
      summarise(Demand = sum(Demand, na.rm = TRUE))
    
    0 讨论(0)
  • 2020-12-15 08:28

    Without dplyr it can be done like this:

    as.data.frame(xtabs(Demand ~ Week + Article, data))
    

    giving:

           Week Article Freq
    1  2013-W01   10004 1215
    2  2013-W02   10004  900
    3  2013-W03   10004  774
    4  2013-W04   10004 1170
    5  2013-W01   10006    0
    6  2013-W02   10006    0
    7  2013-W03   10006    0
    8  2013-W04   10006    5
    9  2013-W01   10007    2
    10 2013-W02   10007    0
    11 2013-W03   10007    0
    12 2013-W04   10007    0
    

    and this can be rewritten as a magrittr or dplyr pipeline like this:

    data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()
    

    The as.data.frame() at the end could be omitted if a wide form solution were desired.

    0 讨论(0)
  • 2020-12-15 08:31

    For this situation you can also use dcast and melt.

       library(dplyr)
       library(reshape2)
       data %>%
          dcast(Article ~ Week, value.var = "Demand", fun.aggregate = sum) %>%
          melt(id = "Article") %>%
          arrange(Article, variable)
    
    0 讨论(0)
  • 2020-12-15 08:32

    I thought I would provide a dplyr-esque solution.

    • use expand.grid() to generate the pair-wise combinations you are looking for.
    • use left_join() to join in the demand data (filling the rest with NAs).

    Solution:

    full_data <- expand.grid(Article=data$Article,Week=data$Week)
    out <- left_join(tbl_dt(full_data),data)
    out[is.na(out)] <- 0    # fill with zeroes for summarise below.
    

    Then as before:

    WeekSums <- out %>%
                group_by(Article, Week) %>%
                summarise(
                         WeekDemand = sum(Demand)
                         )
    

    Fxnal programming?

    If you use this transformation often then perhaps a convenience function:

    xpand <- function(df, col1, col2,na_to_zero=TRUE){
    
        require(dplyr)
    
        # to substitute in the names "as is" need substitute then eval.
        xpand_call <- substitute(     
            expanded <- df %>%
                        select(col1,col2) %>%
                        expand.grid()
        )
    
        eval(xpand_call)                       
    
        out <- left_join(tbl_dt(expanded), df)         # join in any other variables from df.
    
        if(na_to_zero) out[is.na(out)] <- 0    # convert NAs to zeroes?
    
        return(out)
    }
    

    This way you can do:

    expanded_df <- xpand(df,Article,Week)
    
    0 讨论(0)
提交回复
热议问题