removing duplicate units from data frame

天大地大妈咪最大 提交于 2019-11-29 12:52:06
mnel

You can pass a data.frame to duplicated

In your case, you want to pass the first 3 columns of test

 test2 <- test[!duplicated(test[,1:3]),]

If you are using big data, and want to embrace data.tables, then you can set the key to be the first three columns (which you want to remove the duplicates from) and then use unique

library(data.table)
DT <- data.table(test)
# set the key
setkey(DT, UNIT,DATE,OUT1)
DTU <- unique(DT)

For more details on duplicates and data.tables see Filtering out duplicated/non-unique rows in data.table

Thanks! Looks like we can do:

test2 <- test[!duplicated(test[,c("OUT1","DATE","UNIT")]),]

and it delivers the goods as well. So, we can just use the column names rather than 1:3 and the order doesn't matter

You can use distinct() from the dplyr package:

library(dplyr)
test %>%
  distinct(UNIT, DATE, OUT1)

Or without the %>% pipe:

distinct(test, UNIT, DATE, OUT1)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!