R: identify duplicate rows and remove the old entry(By Date)

[亡魂溺海] 提交于 2019-12-25 08:34:01

问题


I have a dataframe of the following form:

   ID    value    modified
1  AA    30       2016-11-03
2  AB    40       2016-11-04
3  AC    50       2016-11-05
4  AA    60       2016-11-06
5  AB    20       2016-11-07

I want to identify all the duplicate rows for ID column and remove rows which has comparatively old modification time. So the output will be:

   ID    value    modified
1  AC    50       2016-11-05
2  AA    60       2016-11-06
3  AB    20       2016-11-07

The code I am trying is as follows:

ID<-c('AA','AB','AD','AA','AB')
value<-c(30,40,50,60,20)
modified<-c('2016-11-03','2016-11-04','2016-11-05','2016-11-06','2016-11-07')
df<-data.frame(ID=ID,value=value,modified=modified)
df
  ID value   modified
1 AA    30 2016-11-03
2 AB    40 2016-11-04
3 AD    50 2016-11-05
4 AA    60 2016-11-06
5 AB    20 2016-11-07

df[!duplicated(df$ID),]
  ID value   modified
1 AA    30 2016-11-03
2 AB    40 2016-11-04
3 AD    50 2016-11-05

But this is not my desired output, how can I remove the old entries? Thank you in advance for any clue or hints.


回答1:


You can use the dplyr package as follows:

library(dplyr)
library(magrittr)

df %<>% group_by(ID) %>% filter(modified==max(modified))

And incase you want the result in a new variable

library(dplyr)

df2 <- df %>% group_by(ID) %>% filter(modified==max(modified))



回答2:


You can solve the problem with base R by first sorting the data frame by date:

df <- df[order(df[["modified"]], decreasing = TRUE), ]

Then you can get the final result with your !duplicated solution:

df[!duplicated(df$ID), ]


来源:https://stackoverflow.com/questions/40877964/r-identify-duplicate-rows-and-remove-the-old-entryby-date

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!