Why is running “unique” faster on a data frame than a matrix in R?

后端 未结 3 1448
旧巷少年郎
旧巷少年郎 2020-12-29 23:53

I\'ve begun to believe that data frames hold no advantages over matrices, except for notational convenience. However, I noticed this oddity when running unique

3条回答
  •  误落风尘
    2020-12-30 00:23

    1. Not sure but I guess that because matrix is one contiguous vector, R copies it into column vectors first (like a data.frame) because paste needs a list of vectors. Note that both are slow because both use paste.

    2. Perhaps because unique.data.table is already many times faster. Please upgrade to v1.6.7 by downloading it from the R-Forge repository because that has the fix to unique you raised in this question. data.table doesn't use paste to do unique.

    a = matrix(sample(2,10^6,replace = TRUE), ncol = 10)
    b = as.data.frame(a)
    system.time(u1<-unique(a))
       user  system elapsed 
       2.98    0.00    2.99 
    system.time(u2<-unique(b))
       user  system elapsed 
       0.99    0.00    0.99 
    c = as.data.table(b)
    system.time(u3<-unique(c))
       user  system elapsed 
       0.03    0.02    0.05  # 60 times faster than u1, 20 times faster than u2
    identical(as.data.table(u2),u3)
    [1] TRUE
    

提交回复
热议问题