I am trying to remove duplicate rows from a data frame based on the max value on a different column
So, for the data frame:
df<-data.frame (rbind(
Here's how I hope your data is really set up
df <- data.frame (id = c(rep("a", 3), rep("b", 2), "r"),
val1 = c(2, 3, 3, 1, 2, 4), val2 = c(3, 4, 5, 3, 6, 5))
You could do a split-unsplit
> unsplit(lapply(split(df, df$id), function(x) {
if(nrow(x) > 1) {
x[duplicated(x$id) & x$val2 == max(x$val2),]
} else {
x
}
}), levels(df$id))
# id val1 val2
# 3 a 3 5
# 5 b 2 6
# 6 r 4 5
You can also use Reduce(rbind, ...) or do.call(rbind, ...) in place of unsplit