Filter dataframe by maximum values in each group [duplicate]

后端未结

关注

 2  460

长发绾君心

相关标签:

2条回答

闹比i

2020-12-16 15:01
aggregate should also work:
```
aggregate(date ~ id, df, max)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

爱一瞬间的悲伤

2020-12-16 15:20

Here's a simple and fast approach using data.table package

library(data.table)
setDT(df)[, .SD[which.max(date)], id]
#    id date
# 1:  1 2012
# 2:  3 2014
# 3:  2 2014

Or (could be a bit faster because of keyed by

setkey(setDT(df), id)[, .SD[which.max(date)], id]

Or using OPs idea via the data.table package

unique(setorder(setDT(df), id, -date), by = "id")

setorder(setDT(df), id, -date)[!duplicated(id)]

Or base R solution

with(df, tapply(date, id, function(x) x[which.max(x)]))
##    1    2    3 
## 2012 2014 2014

Another way

library(dplyr)
df %>%
  group_by(id) %>%
  filter(date == max(date)) # Will keep all existing columns but allow multiple rows in case of ties
# Source: local data table [3 x 2]
# Groups: id
# 
#   id date
# 1  1 2012
# 2  2 2014
# 3  3 2014

df %>%
  group_by(id) %>%
  slice(which.max(date)) # Will keep all columns but won't return multiple rows in case of ties

df %>%
  group_by(id) %>%
  summarise(max(date)) # Will remove all other columns and wont return multiple rows in case of ties

0 讨论(0)

热议问题