Remove duplicates based on 2nd column condition

前端 未结 4 751
野的像风
野的像风 2020-12-02 00:17

I am trying to remove duplicate rows from a data frame based on the max value on a different column

So, for the data frame:

df<-data.frame (rbind(         


        
4条回答
  •  长情又很酷
    2020-12-02 00:21

    Here's how I hope your data is really set up

    df <- data.frame (id = c(rep("a", 3), rep("b", 2), "r"),
                      val1 = c(2, 3, 3, 1, 2, 4), val2 = c(3, 4, 5, 3, 6, 5))
    

    You could do a split-unsplit

    > unsplit(lapply(split(df, df$id), function(x) {
          if(nrow(x) > 1) {
              x[duplicated(x$id) & x$val2 == max(x$val2),]
          } else {
              x
          }
      }), levels(df$id))
    #   id val1 val2
    # 3  a    3    5
    # 5  b    2    6
    # 6  r    4    5
    

    You can also use Reduce(rbind, ...) or do.call(rbind, ...) in place of unsplit

提交回复
热议问题