Remove duplicates based on 2nd column condition

前端 未结 4 739
野的像风
野的像风 2020-12-02 00:17

I am trying to remove duplicate rows from a data frame based on the max value on a different column

So, for the data frame:

df<-data.frame (rbind(         


        
相关标签:
4条回答
  • 2020-12-02 00:21

    Here's how I hope your data is really set up

    df <- data.frame (id = c(rep("a", 3), rep("b", 2), "r"),
                      val1 = c(2, 3, 3, 1, 2, 4), val2 = c(3, 4, 5, 3, 6, 5))
    

    You could do a split-unsplit

    > unsplit(lapply(split(df, df$id), function(x) {
          if(nrow(x) > 1) {
              x[duplicated(x$id) & x$val2 == max(x$val2),]
          } else {
              x
          }
      }), levels(df$id))
    #   id val1 val2
    # 3  a    3    5
    # 5  b    2    6
    # 6  r    4    5
    

    You can also use Reduce(rbind, ...) or do.call(rbind, ...) in place of unsplit

    0 讨论(0)
  • 2020-12-02 00:22

    Using base R. Here, the columns are factors. Make sure to convert it to numeric

     df$val2 <- as.numeric(as.character(df$val2))
     df[with(df, ave(val2, id, FUN=max)==val2),]
     #  id val1 val2
     #3  a    3    5
     #5  b    2    6
     #6  r    4    5
    

    Or using dplyr

     library(dplyr)
     df %>% 
        group_by(id) %>% 
        filter(val2==max(val2))
     #   id val1 val2
     #1  a    3    5
     #2  b    2    6
     #3  r    4    5
    
    0 讨论(0)
  • 2020-12-02 00:40

    One possible way is to use data.table

    library(data.table)
    setDT(df)[, .SD[which.max(val2)], by = id]
    ##    id val1 val2
    ## 1:  a    3    5
    ## 2:  b    2    6
    ## 3:  r    4    5
    
    0 讨论(0)
  • 2020-12-02 00:44

    Another one

    df %>% group_by(id) %>%
             slice(which.max(val2))
    
      id   val1  val2
       a    3     5
       b    2     6
       r    4     5
    
    0 讨论(0)
提交回复
热议问题