问题
I read in somewhere that when creating a matrix, R stores the elements of the matrix in a vector as a column major format with additional information about the matrix dimensions.
If
matrix(1:6, nrow = 3, ncol = 2)
R internally stores values as a vector of 1:6
.
However, if we set byrow = TRUE
, does it mean R is storing values as c(1, 3, 5, 2, 4, 6)
?
回答1:
The byrow = TRUE
tells R that it needs to manipulate the input to get it to the column-major order. So yes, matrix(1:6, nrow = 3, byrow = TRUE)
stores the values as 1 3 5 2 4 6
internally - it reorders them before creating the matrix.
We can verify this in a couple ways. First, we can compare two matrices with the same values, one created with byrow = TRUE
and one not, and see that they are the same:
by_col = matrix(1L:4L, 2)
by_row = matrix(c(1L, 3L, 2L, 4L), 2, byrow = TRUE)
identical(by_col, by_row)
# [1] TRUE
We can also examine the structure of the "by-row" matrix and see that nothing in the data structure keeps track of the fact that it was created with byrow = TRUE
:
# notice the order is 1 2 3 4, not the input order 1 3 2 4
str(by_row)
# int [1:2, 1:2] 1 2 3 4
dput(by_row)
# structure(1:4, .Dim = c(2L, 2L))
With a big enough matrix for the difference to matter, we can observe the extra processing time needed to create a matrix by row:
microbenchmark::microbenchmark(
by_col = matrix(1:1e6, nrow = 1000),
by_row = matrix(1:1e6, nrow = 1000, byrow = TRUE),
times = 100
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# by_col 2.071366 2.214147 5.943154 4.474175 5.512274 92.49424 100
# by_row 10.513797 11.112386 15.700628 13.850260 14.485675 98.94681 100
On a 1000x1000 matrix, it takes about 3x longer to create a matrix byrow
, because R needs to manipulate the data into column-major format.
Finally, if you want to be really ambitious, you can look through the C source code for creating matrices, and see how the byrow
argument is used internally. Here are the relevant lines. My C isn't great, but it looks to me like the byrow = TRUE
just does a bit of extra processing, reordering the input to column-major order, before doing the same thing as byrow = FALSE
.
来源:https://stackoverflow.com/questions/51255759/byrow-true-argument-in-matrix-function-for-r