Below is a reproducible data table with four columns:
As the title suggests, I\'d like t
Is this what you're looking for?
dt %>%
group_by(category) %>%
summarise(date = max(date),
value1 = mean(value1),
value2 = mean(value2)) %>%
ungroup()
# A tibble: 3 x 4
category date value1 value2
<chr> <date> <dbl> <dbl>
1 A 2017-02-01 94.00000 56.0
2 B 2017-04-01 96.50000 56.5
3 C 2017-10-01 95.66667 55.0
Here is the data.table approach. We can perform the calculations in .() with the j argument, and set the grouping in the by argument.
dt[, .(date = last(date), value1 = mean(value1), value2 = mean(value2)), by = category]
Here is a more efficient way developed by Frank as a comment on this post. This approach only needs to write the mean function once, using .SD and .SDcols to specify which columns to be summarised in mean.
dt[, c(.(date = last(date)), lapply(.SD, mean)), by = category, .SDcols = value1:value2]
And if you want to use dplyr, you can use Z.Lin's approach. However, if there are lots of value columns, such as value1 to value10, you can do the following.
dt %>%
group_by(category) %>%
summarise_all(funs(if_else(is.numeric(.), mean(.), last(.))))
This code will only calculate mean if the columns are numeric, otherwise, it will report the last row of a group.
One final reminder, summarise_each has been deprecated. Please use summarise_all, summarise_if, or summarise_at.