How many unique keys does my data.table have?

后端 未结 2 549
春和景丽
春和景丽 2021-01-25 04:24

Given a data.table, how do I find the number of unique keys it contains?

library(data.table)
z <- data.table(id=c(1,2,1,3),key=\"id\")
length(uni         


        
2条回答
  •  南方客
    南方客 (楼主)
    2021-01-25 05:17

    Maybe this:

    sum(Negate(duplicated)(z$id))
    

    z$id remains sorted, so duplicated can work faster on it:

    bigVec <- sample(1:100000, 30000000, replace=TRUE)
    system.time( sum(Negate(duplicated)(bigVec)) )
       user  system elapsed 
      8.161   0.475   8.690 
    
    bigVec <- sort(bigVec)
    system.time( sum(Negate(duplicated)(bigVec)) )
       user  system elapsed 
       0.00    2.09    2.10 
    

    But I just checked and length(unique()) works faster on sorted vectors as well...

    So maybe there is some kind of checking if the vector is sorted going on (which can be done in a linear time). To me this doesn't look to be quadratic:

    system.time( length(unique(bigVec)) )
       user  system elapsed 
      0.000   0.583   0.664 
    
    bigVec <- sort(sample(1:100000, 20000000, replace=TRUE))
    system.time( length(unique(bigVec)) )
       user  system elapsed 
      0.000   1.290   1.242 
    
    bigVec <- sort(sample(1:100000, 30000000, replace=TRUE))
    system.time( length(unique(bigVec)) )
       user  system elapsed 
      0.000   1.655   1.715 
    

提交回复
热议问题