get rows of unique values by group

问题

I have a data.table and want to pick those lines of the data.table where some values of a variable x are unique relative to another variable y

It's possible to get the unique values of x, grouped by y in a separate dataset, like this

dt[,unique(x),by=y]

But I want to pick the rows in the original dataset where this is the case. I don't want a new data.table because I also need the other variables.

So, what do I have to add to my code to get the rows in dt for which the above is true?

dt <- data.table(y=rep(letters[1:2],each=3),x=c(1,2,2,3,2,1),z=1:6) 

   y x z
1: a 1 1
2: a 2 2
3: a 2 3
4: b 3 4
5: b 2 5
6: b 1 6

What I want:

   y x z
1: a 1 1
2: a 2 2
3: b 3 4
4: b 2 5
5: b 1 6

回答1:

data.table is a bit different in how to use duplicated. Here's the approach I've seen around here somewhere before:

dt <- data.table(y=rep(letters[1:2],each=3),x=c(1,2,2,3,2,1),z=1:6) 
setkey(dt, "y", "x")
key(dt)
# [1] "y" "x"
!duplicated(dt)
# [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
dt[!duplicated(dt)]
#    y x z
# 1: a 1 1
# 2: a 2 2
# 3: b 1 6
# 4: b 2 5
# 5: b 3 4

回答2:

The idiomatic data.table way is:

require(data.table)
unique(dt, by = c("y", "x"))
#    y x z
# 1: a 1 1
# 2: a 2 2
# 3: b 3 4
# 4: b 2 5
# 5: b 1 6

回答3:

The simpler data.table solution is to grab the first element of each group

> dt[, head(.SD, 1), by=.(y, x)]
   y x z
1: a 1 1
2: a 2 2
3: b 3 4
4: b 2 5
5: b 1 6

回答4:

Thanks to dplyR

library(dplyr)
col1 = c(1,1,3,3,5,6,7,8,9)
col2 = c("cust1", 'cust1', 'cust3', 'cust4', 'cust5', 'cust5', 'cust5',     'cust5', 'cust6')
df1 = data.frame(col1, col2)
df1

distinct(select(df1, col1, col2))

来源：https://stackoverflow.com/questions/18481930/get-rows-of-unique-values-by-group

标签

data.table