Below is a reproducible data table with four columns:
As the title suggests, I\'d like t
Is this what you're looking for?
dt %>%
group_by(category) %>%
summarise(date = max(date),
value1 = mean(value1),
value2 = mean(value2)) %>%
ungroup()
# A tibble: 3 x 4
category date value1 value2
<chr> <date> <dbl> <dbl>
1 A 2017-02-01 94.00000 56.0
2 B 2017-04-01 96.50000 56.5
3 C 2017-10-01 95.66667 55.0
Here is the data.table
approach. We can perform the calculations in .()
with the j
argument, and set the grouping in the by
argument.
dt[, .(date = last(date), value1 = mean(value1), value2 = mean(value2)), by = category]
Here is a more efficient way developed by Frank as a comment on this post. This approach only needs to write the mean
function once, using .SD
and .SDcols
to specify which columns to be summarised in mean.
dt[, c(.(date = last(date)), lapply(.SD, mean)), by = category, .SDcols = value1:value2]
And if you want to use dplyr
, you can use Z.Lin's approach. However, if there are lots of value columns, such as value1
to value10
, you can do the following.
dt %>%
group_by(category) %>%
summarise_all(funs(if_else(is.numeric(.), mean(.), last(.))))
This code will only calculate mean if the columns are numeric, otherwise, it will report the last row of a group.
One final reminder, summarise_each
has been deprecated. Please use summarise_all
, summarise_if
, or summarise_at
.