问题
I would like to select the youngest person in each group and categorize it by gender
so this is my initial data
data1
ID Age Gender Group
1 A01 25 m a
2 A02 35 f b
3 B03 45 m b
4 C99 50 m b
5 F05 60 f a
6 X05 65 f a
I would like to have this
Gender Group Age ID
m a 25 A01
f a 60 F05
m b 45 B03
f b 35 A02
So I tried with aggraeate function but I don't know how to attach the ID to it
aggregate(Age~Gender+Group,data1,min)
Gender Group Age
m a 25
f a 60
m b 45
f b 35
回答1:
We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(data1)). If it is to get the row corresponding to the min of 'Age', we use which.min to get the row index of the min 'Age' grouped by 'Gender', 'Group' and then use that to subset the rows (.SD[which.min(Age)]).
setDT(data1)[, .SD[which.min(Age)], by = .(Gender, Group)]
Or another option would be to order by 'Gender', 'Group', 'Age', and then get the first row using unique.
unique(setDT(data1)[order(Gender,Group,Age)],
by = c('Gender', 'Group'))
Or using the same methodology with dplyr, we use slice with which.min to get the corresponding 'Age' grouped by 'Gender', 'Group'.
library(dplyr)
data1 %>%
group_by(Gender, Group) %>%
slice(which.min(Age))
Or we can arrange by 'Gender', 'Group', 'Age' and then get the first row
data1 %>%
arrange(Gender,Group, Age) %>%
group_by(Gender,Group) %>%
slice(1L)
来源:https://stackoverflow.com/questions/33950457/selecting-other-row-element-after-aggregate-in-r