R, conditionally remove duplicate rows

前端 未结 3 689
天涯浪人
天涯浪人 2020-12-09 20:17

I have a dataframe in R containing the columns ID.A, ID.B and DISTANCE, where distance represents the distance between ID.A and ID.B. For each value (1->n) of ID.A, there ma

相关标签:
3条回答
  • 2020-12-09 20:55

    You can use the plyr package for that. For example, if your data are like these :

    d <- data.frame(id.a=c(1,1,1,2,2,3,3,3,3),
                    id.b=c(1,2,3,1,2,1,2,3,4),
                    dist=c(12,10,15,20,18,16,17,25,9))
    
      id.a id.b dist
    1    1    1   12
    2    1    2   10
    3    1    3   15
    4    2    1   20
    5    2    2   18
    6    3    1   16
    7    3    2   17
    8    3    3   25
    9    3    4    9
    

    You can use the ddply function like this :

    library(plyr)
    ddply(d, "id.a", function(df) return(df[df$dist==min(df$dist),]))
    

    Which gives :

      id.a id.b dist
    1    1    2   10
    2    2    2   18
    3    3    4    9
    
    0 讨论(0)
  • 2020-12-09 20:57

    One possibility:

    myDF <- myDF[order(myDF$ID.A, myDF$DISTANCE), ] 
    
    newdata <- myDF[which(!duplicated(myDF$ID.A)),]
    

    Which gives :

        ID.A ID.B DISTANCE
    1    1    3      1.0
    2    2    6      8.0
    5    3    2      0.4
    6    4    8      7.0
    7    5    2     11.0
    
    0 讨论(0)
  • 2020-12-09 21:15

    You can also do it easily in base R. If dat is your dataframe,

    do.call(rbind, 
            by(dat, INDICES=list(dat$ID.A), 
               FUN=function(x) head(x[order(x$DISTANCE), ], 1)))
    
    0 讨论(0)
提交回复
热议问题