I\'ve begun to believe that data frames hold no advantages over matrices, except for notational convenience. However, I noticed this oddity when running unique
In this implementation, unique.matrix
is the same as unique.array
> identical(unique.array, unique.matrix)
[1] TRUE
unique.array
has to handle multi-dimensional arrays which requires additional processing to ‘collapse’ the extra dimensions (those extra calls to paste()
) which are not needed in the 2-dimensional case. The key section of code is:
collapse <- (ndim > 1L) && (prod(dx[-MARGIN]) > 1L)
temp <- if (collapse)
apply(x, MARGIN, function(x) paste(x, collapse = "\r"))
unique.data.frame
is optimised for the 2D case, unique.matrix
is not. It could be, as you suggest, it just isn't in the current implementation.
Note that in all cases (unique.{array,matrix,data.table}) where there is more than one dimension it is the string representation that is compared for uniqueness. For floating point numbers this means 15 decimal digits so
NROW(unique(a <- matrix(rep(c(1, 1+4e-15), 2), nrow = 2)))
is 1
while
NROW(unique(a <- matrix(rep(c(1, 1+5e-15), 2), nrow = 2)))
and
NROW(unique(a <- matrix(rep(c(1, 1+4e-15), 1), nrow = 2)))
are both 2
. Are you sure unique
is what you want?