Get rid of rows with duplicate attributes in R

做~自己de王妃 提交于 2019-12-31 10:41:50

问题


I have a big dataframe with columns such as:

ID, time, OS, IP

Each row of that dataframe corresponds to one entry. Within that dataframe for some IDs several entries (rows) exist. I would like to get rid of those multiple rows (obviously the other attributes will differ for the same ID). Or put different: I only want one single entry (row) for each ID.

When I use unique on the ID column, I only receive the levels (or each unique ID), but I want to keep the other attributes as well. I have tried to use apply(x,2,unique(data$ID)), but this does not work either.


回答1:


subset(data,!duplicated(data$ID))

Should do the trick




回答2:


If you want to keep one row for each ID, but there is different data on each row, then you need to decide on some logic to discard the additional rows. For instance:

df <- data.frame(ID=c(1, 2, 2, 3), time=1:4, OS="Linux")
df
  ID time    OS
1  1    1 Linux
2  2    2 Linux
3  2    3 Linux
4  3    4 Linux

Now I will keep the maximum time value and the last OS value:

library(plyr)
unique(ddply(df, .(ID), function(x) data.frame(ID=x[,"ID"], time=max(x$time), OS=tail(x$OS,1))))
  ID time    OS
1  1    1 Linux
2  2    3 Linux
4  3    4 Linux


来源:https://stackoverflow.com/questions/2759394/get-rid-of-rows-with-duplicate-attributes-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!