Deleting rows that are duplicated in one column based on the conditions of another column

无人久伴 提交于 2019-11-28 06:49:29

Lets say you have data in df

df = df[order(df[,'Date'],-df[,'Depth']),]
df = df[!duplicated(df$Date),]
David Arenburg

Introducing a data.table solution which will be the fastest way to solve this (assuming data is your data set)

library(data.table)
unique(setDT(data)[order(Date, -Depth)], by = "Date")

Just another way:

setDT(data)[data[, .I[which.max(Depth)], by=Date]$V1]

This might be not the fastest approach if your data frame is large, but a fairly strightforward one. This might change the order of your data frame and you might need to reorder by e.g. date afterwards. Instead of deleting we split the data by date, in each chunk pick a row with the maximum date and finally join the result back into a data frame

data = split(data, data$Date)
data = lapply(data, function(x) x[which.max(x$Depth), , drop=FALSE])
data = do.call("rbind", data)
# First find the maxvalues
maxvals = aggregate(df$Depth~df$Date, FUN=max)
#Now use apply to find the matching rows and separate them out
out = df[apply(maxvals,1,FUN=function(x) which(paste(df$Date,df$Depth) == paste(x[1],x[2]))),]

Does that work for you?

You might also use dplyr's arrange() instead of order (I find it more intuitive):

df <- arrange(df, Date, -Depth)
df <- df[!duplicated(df$Date),]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!