How to make a unique in R by column A and keep the row with maximum value in column B

问题

I have a data.frame with several columns (17). Column 2 have several rows with the same value, I want to keep only one of those rows, specifically the one that has the maximum value in column 17.

For example:

A    B
'a'  1
'a'  2
'a'  3
'b'  5
'b'  200

Would return
A    B
'a'  3
'b'  200

(plus the rest of the columns)

So far I've been using the unique function, but I think it randomly keeps one or keeps just the first one that appears.

** UPDATE ** The real data has 376000 rows. I've tried the data.table and ddply suggestions but they take forever. Any idea which is the most efficient?

回答1:

A solution using package data.table:

set.seed(42)
dat <- data.frame(A=c('a','a','a','b','b'),B=c(1,2,3,5,200),C=rnorm(5))
library(data.table)

dat <- as.data.table(dat)
dat[,.SD[which.max(B)],by=A]

   A   B         C
1: a   3 0.3631284
2: b 200 0.4042683

回答2:

A not so elegant solution using R base functions

> ind <- with(dat, tapply(B, A, which.max)) # Using @Roland's data
> mysplit <- split(dat, dat$A)
> do.call(rbind, lapply(1:length(mysplit), function(i) mysplit[[i]][ind[i],]))
  A   B         C
3 a   3 0.3631284
5 b 200 0.4042683

来源：https://stackoverflow.com/questions/14335733/how-to-make-a-unique-in-r-by-column-a-and-keep-the-row-with-maximum-value-in-col

标签

unique

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!