Find average by group over a time period and retrieve last date for same period

后端 未结 2 2110
醉酒成梦
醉酒成梦 2021-01-07 09:35

Below is a reproducible data table with four columns:

  1. Date
  2. category
  3. value1
  4. value2

As the title suggests, I\'d like t

相关标签:
2条回答
  • 2021-01-07 09:47

    Is this what you're looking for?

    dt %>%
      group_by(category) %>%
      summarise(date = max(date),
                value1 = mean(value1),
                value2 = mean(value2)) %>%
      ungroup()
    
    # A tibble: 3 x 4
      category       date   value1 value2
         <chr>     <date>    <dbl>  <dbl>
    1        A 2017-02-01 94.00000   56.0
    2        B 2017-04-01 96.50000   56.5
    3        C 2017-10-01 95.66667   55.0
    
    0 讨论(0)
  • 2021-01-07 10:01

    Here is the data.table approach. We can perform the calculations in .() with the j argument, and set the grouping in the by argument.

    dt[, .(date = last(date), value1 = mean(value1), value2 = mean(value2)), by = category]
    

    Here is a more efficient way developed by Frank as a comment on this post. This approach only needs to write the mean function once, using .SD and .SDcols to specify which columns to be summarised in mean.

    dt[, c(.(date = last(date)), lapply(.SD, mean)), by = category, .SDcols = value1:value2]
    

    And if you want to use dplyr, you can use Z.Lin's approach. However, if there are lots of value columns, such as value1 to value10, you can do the following.

    dt %>%
      group_by(category) %>%
      summarise_all(funs(if_else(is.numeric(.), mean(.), last(.))))
    

    This code will only calculate mean if the columns are numeric, otherwise, it will report the last row of a group.

    One final reminder, summarise_each has been deprecated. Please use summarise_all, summarise_if, or summarise_at.

    0 讨论(0)
提交回复
热议问题