Given a data.table
, how do I find the number of unique keys it contains?
library(data.table)
z <- data.table(id=c(1,2,1,3),key=\"id\")
length(uni
Maybe this:
sum(Negate(duplicated)(z$id))
z$id remains sorted, so duplicated can work faster on it:
bigVec <- sample(1:100000, 30000000, replace=TRUE)
system.time( sum(Negate(duplicated)(bigVec)) )
user system elapsed
8.161 0.475 8.690
bigVec <- sort(bigVec)
system.time( sum(Negate(duplicated)(bigVec)) )
user system elapsed
0.00 2.09 2.10
But I just checked and length(unique()) works faster on sorted vectors as well...
So maybe there is some kind of checking if the vector is sorted going on (which can be done in a linear time). To me this doesn't look to be quadratic:
system.time( length(unique(bigVec)) )
user system elapsed
0.000 0.583 0.664
bigVec <- sort(sample(1:100000, 20000000, replace=TRUE))
system.time( length(unique(bigVec)) )
user system elapsed
0.000 1.290 1.242
bigVec <- sort(sample(1:100000, 30000000, replace=TRUE))
system.time( length(unique(bigVec)) )
user system elapsed
0.000 1.655 1.715