问题
Assuming I have a data frame:
t <- data.frame(d1=c( 694, 695, 696, 2243, 2244, 2651, 2652 ),
d2=c(1.80950881, 1.80951007, 1.80951052, 1.46499982, 1.46500087, 1.14381419, 1.14381319 ))
d1 d2
1 694 1.809509
2 695 1.809510
3 696 1.809511
4 2243 1.465000
5 2244 1.465001
6 2651 1.143814
7 2652 1.143813
I'd like to group by the column d2 real values that have very close but not exactly equal values. Thus, in this example, after aggregation, I'd like to obtain the following data set:
d1 d2
1 694 1.809509
2 2243 1.465000
3 2652 1.143813
taking the row with minimum d2 value from each group.
Using the aggregate function, my first attempt:
aggregate(t, by=list(t$d2), FUN=min)
Group.1 d1 d2
1 1.143813 2652 1.143813
2 1.143814 2651 1.143814
3 1.465000 2243 1.465000
4 1.465001 2244 1.465001
5 1.809509 694 1.809509
6 1.809510 695 1.809510
7 1.809511 696 1.809511
is far from reaching my goal.
How can I tell aggregate to group not by exact equality, but by equality with provided error tolerance?
回答1:
Here is an approach with tidyverse:
library(tidyverse)
t %>%
group_by(round(d2, 1)) %>% #group by rounded d2
filter(d2 == min(d2)) %>% #filter min d1 in each group
ungroup() %>% #ungroup so you can remove the grouping column
select(-3)
回答2:
This work with your toy data i don't know with real one, you might have to round to more or less digits
aggregate(t, by=list(round(t$d2,4)), FUN=min)
来源:https://stackoverflow.com/questions/48955848/r-how-to-aggregate-by-real-values-column-with-given-error-tolerance