Relative frequencies / proportions with dplyr

前端 未结 9 2424
灰色年华
灰色年华 2020-11-22 09:25

Suppose I want to calculate the proportion of different values within each group. For example, using the mtcars data, how do I calculate the relative f

9条回答
  •  闹比i
    闹比i (楼主)
    2020-11-22 09:44

    Try this:

    mtcars %>%
      group_by(am, gear) %>%
      summarise(n = n()) %>%
      mutate(freq = n / sum(n))
    
    #   am gear  n      freq
    # 1  0    3 15 0.7894737
    # 2  0    4  4 0.2105263
    # 3  1    4  8 0.6153846
    # 4  1    5  5 0.3846154
    

    From the dplyr vignette:

    When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.

    Thus, after the summarise, the last grouping variable specified in group_by, 'gear', is peeled off. In the mutate step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with groups.

    The outcome of the peeling is of course dependent of the order of the grouping variables in the group_by call. You may wish to do a subsequent group_by(am), to make your code more explicit.

    For rounding and prettification, please refer to the nice answer by @Tyler Rinker.

提交回复
热议问题