Find average by group over a time period and retrieve last date for same period

后端未结

关注

 2  2142

醉酒成梦

Below is a reproducible data table with four columns:

Date
category
value1
value2

As the title suggests, I\'d like t

相关标签:

2条回答

南旧

2021-01-07 09:47

Is this what you're looking for?

dt %>%
  group_by(category) %>%
  summarise(date = max(date),
            value1 = mean(value1),
            value2 = mean(value2)) %>%
  ungroup()

# A tibble: 3 x 4
  category       date   value1 value2
     <chr>     <date>    <dbl>  <dbl>
1        A 2017-02-01 94.00000   56.0
2        B 2017-04-01 96.50000   56.5
3        C 2017-10-01 95.66667   55.0

0 讨论(0)

遥遥无期

2021-01-07 10:01
Here is the data.table approach. We can perform the calculations in .() with the j argument, and set the grouping in the by argument.
```
dt[, .(date = last(date), value1 = mean(value1), value2 = mean(value2)), by = category]
```
Here is a more efficient way developed by Frank as a comment on this post. This approach only needs to write the mean function once, using .SD and .SDcols to specify which columns to be summarised in mean.
```
dt[, c(.(date = last(date)), lapply(.SD, mean)), by = category, .SDcols = value1:value2]
```
And if you want to use dplyr, you can use Z.Lin's approach. However, if there are lots of value columns, such as value1 to value10, you can do the following.
```
dt %>%
  group_by(category) %>%
  summarise_all(funs(if_else(is.numeric(.), mean(.), last(.))))
```
This code will only calculate mean if the columns are numeric, otherwise, it will report the last row of a group.

One final reminder, summarise_each has been deprecated. Please use summarise_all, summarise_if, or summarise_at.
0 讨论(0)
发布评论:

提交评论
- 加载中...