Why is running “unique” faster on a data frame than a matrix in R?

后端未结

关注

 3  1448

旧巷少年郎 2020-12-29 23:53

I\'ve begun to believe that data frames hold no advantages over matrices, except for notational convenience. However, I noticed this oddity when running unique

3条回答

误落风尘 (楼主)

2020-12-30 00:23
1. Not sure but I guess that because matrix is one contiguous vector, R copies it into column vectors first (like a data.frame) because paste needs a list of vectors. Note that both are slow because both use paste.
2. Perhaps because unique.data.table is already many times faster. Please upgrade to v1.6.7 by downloading it from the R-Forge repository because that has the fix to unique you raised in this question. data.table doesn't use paste to do unique.
```
a = matrix(sample(2,10^6,replace = TRUE), ncol = 10)
b = as.data.frame(a)
system.time(u1<-unique(a))
   user  system elapsed 
   2.98    0.00    2.99 
system.time(u2<-unique(b))
   user  system elapsed 
   0.99    0.00    0.99 
c = as.data.table(b)
system.time(u3<-unique(c))
   user  system elapsed 
   0.03    0.02    0.05  # 60 times faster than u1, 20 times faster than u2
identical(as.data.table(u2),u3)
[1] TRUE
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...