byrow= TRUE argument in matrix function for R

无人久伴 提交于 2021-01-27 15:16:56

问题


I read in somewhere that when creating a matrix, R stores the elements of the matrix in a vector as a column major format with additional information about the matrix dimensions.

If

matrix(1:6, nrow = 3, ncol = 2)

R internally stores values as a vector of 1:6.

However, if we set byrow = TRUE, does it mean R is storing values as c(1, 3, 5, 2, 4, 6)?


回答1:


The byrow = TRUE tells R that it needs to manipulate the input to get it to the column-major order. So yes, matrix(1:6, nrow = 3, byrow = TRUE) stores the values as 1 3 5 2 4 6 internally - it reorders them before creating the matrix.

We can verify this in a couple ways. First, we can compare two matrices with the same values, one created with byrow = TRUE and one not, and see that they are the same:

by_col = matrix(1L:4L, 2)
by_row = matrix(c(1L, 3L, 2L, 4L), 2, byrow = TRUE)
identical(by_col, by_row)
# [1] TRUE

We can also examine the structure of the "by-row" matrix and see that nothing in the data structure keeps track of the fact that it was created with byrow = TRUE:

# notice the order is 1 2 3 4, not the input order 1 3 2 4
str(by_row)
# int [1:2, 1:2] 1 2 3 4
dput(by_row)
# structure(1:4, .Dim = c(2L, 2L))

With a big enough matrix for the difference to matter, we can observe the extra processing time needed to create a matrix by row:

microbenchmark::microbenchmark(
  by_col = matrix(1:1e6, nrow = 1000),
  by_row = matrix(1:1e6, nrow = 1000, byrow = TRUE),
  times = 100
)
# Unit: milliseconds
#    expr       min        lq      mean    median        uq      max neval
#  by_col  2.071366  2.214147  5.943154  4.474175  5.512274 92.49424   100
#  by_row 10.513797 11.112386 15.700628 13.850260 14.485675 98.94681   100

On a 1000x1000 matrix, it takes about 3x longer to create a matrix byrow, because R needs to manipulate the data into column-major format.

Finally, if you want to be really ambitious, you can look through the C source code for creating matrices, and see how the byrow argument is used internally. Here are the relevant lines. My C isn't great, but it looks to me like the byrow = TRUE just does a bit of extra processing, reordering the input to column-major order, before doing the same thing as byrow = FALSE.



来源:https://stackoverflow.com/questions/51255759/byrow-true-argument-in-matrix-function-for-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!