Assign a matrix to a subset of a data.table

≡放荡痞女 提交于 2020-01-14 08:05:30

问题


I would like to assign a matrix to a multi-column subset of a data.table but the matrix ends up getting treated as a column vector. For example,

dt1 <- data.table(a1=rnorm(5), a2=rnorm(5), a3=rnorm(5))
m1 <- matrix(rnorm(10), ncol=2)
dt1[,c("a1","a2")] <- m1

Warning messages:
1: In `[<-.data.table`(`*tmp*`, , c("a1", "a2"), value = c(-0.308851784175091,  :
  2 column matrix RHS of := will be treated as one vector
2: In `[<-.data.table`(`*tmp*`, , c("a1", "a2"), value = c(-0.308851784175091,  :
  Supplied 10 items to be assigned to 5 items of column 'a1' (5 unused)
3: In `[<-.data.table`(`*tmp*`, , c("a1", "a2"), value = c(-0.308851784175091,  :
  2 column matrix RHS of := will be treated as one vector
4: In `[<-.data.table`(`*tmp*`, , c("a1", "a2"), value = c(-0.308851784175091,  :
  Supplied 10 items to be assigned to 5 items of column 'a2' (5 unused)

The problem can be solved by first converting m1 to be another data.table object, but I'm curious what the reasonsing is for this error. The above syntax would work if dt1 were a data.frame; what is the architectural rationale for not having it work with data.table?


回答1:


A data.frame is not a matrix, nor is a data.table a matrix. Both data.frame and data.table objects are lists. These are stored very differently, although the indexing can be similar, this is processed under the hood.

Within [<-.data.frame splits a matrix-valued value into a list with an element for each column.

(The line is value <- split(value, col(value)))).

Note also that [<-.data.frame will copy the entire data.frame in the process of assigning something to a subset of columns.

data.table attempts to avoid this copying, as such [<-.data.table should be avoided, as all <- methods in R make copies.

Within [<-.data.table, [<-.data.frame will be called if i is a matrix, but not if only value is.

data.table usually likes you to be explicit in ensuring that the types of data match when assigning. This helps avoid any coercion and related copying.

You could, perhaps put in a feature request here to ensure compatibility, but given your usage is far outside what is recommended, then perhaps the package authors might request you simply use the data.table conventions and approaches.




回答2:


dt1[,c("a1","a2")] <- as.data.table(m1)

gives a simple solution but does make a copy.

@Simon O'Hanlon provides a solution in the data.table way:

dt1[ , `:=`( a1 = m1[,1] , a2 = m1[,2] ) ]

and in my opinion an even better data.table solution is provided by @Frank:

dt1[,c("a1","a2") := as.data.table(m1)]


来源:https://stackoverflow.com/questions/19918946/assign-a-matrix-to-a-subset-of-a-data-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!