Unique on a dataframe with only selected columns

前端 未结 4 1146
逝去的感伤
逝去的感伤 2020-11-27 13:13

I have a dataframe with >100 columns, and I would to find the unique rows, by comparing only two of the columns. I\'m hoping this is an easy one, but I can\'t get it working

4条回答
  •  执念已碎
    2020-11-27 13:28

    Ok, if it doesn't matter which value in the non-duplicated column you select, this should be pretty easy:

    dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))
    > dat[!duplicated(dat[,c('id','id2')]),]
      id id2 somevalue
    1  1   1         x
    3  3   4         z
    

    Inside the duplicated call, I'm simply passing only those columns from dat that I don't want duplicates of. This code will automatically always select the first of any ambiguous values. (In this case, x.)

提交回复
热议问题