fast melt large 2d matrix to 3 column data.table

╄→гoц情女王★ 提交于 2021-01-24 05:41:14

问题


I have a large matrix num [1:62410, 1:48010]

I want this in a long format data.table

e.g.

   Var1 Var2     value
1:    1    1 -4227.786
2:    2    1 -4211.908
3:    3    1 -4197.034
4:    4    1 -4183.645
5:    5    1 -4171.692
6:    6    1 -4161.634

minimal example

m = matrix(1:5, nrow = 1000, ncol = 1000)
x = data.table(reshape2::melt(m))

ideally I'd want the columns names x, y and value at the same time.

Previously I've been using data.table(melt(mymatrix)). But judging by the warnings that reshape2::melt is deprecated, this is probably not optimal in terms of speed, what would be the best "data.table" way of solving this?

the following do not answer my question: Fast melted data.table operations Proper/fastest way to reshape a data.table

Other answers refer to the deprecated reshape2 package


回答1:


Here's an example:

# example matrix
m = matrix(1:12, nrow = 4)

# load data table
library(data.table)

We can extract the data, row and column info directly and it should be pretty fast:

dt = data.table(
  row = rep(seq_len(nrow(m)), ncol(m)), 
  col = rep(seq_len(ncol(m)), each = nrow(m)), 
  value = c(m)
)

The result is:

    row col value
 1:   1   1     1
 2:   2   1     2
 3:   3   1     3
 4:   4   1     4
 5:   1   2     5
 6:   2   2     6
 7:   3   2     7
 8:   4   2     8
 9:   1   3     9
10:   2   3    10
11:   3   3    11
12:   4   3    12



回答2:


There is a as.data.table method for array which will do the trick for you.

dim(m) <- c(dim(m), 1L)
as.data.table(m)

In future, when posting questions on SO, please provide minimal example.

I now looked at the source of it and I see it may not be very efficient, because it materializes all NA values, and then removes them.




回答3:


A while ago I've run into the same problem as @BetaScoo8 and asked a similar question (see here). As pointed out by @jangorecki as.data.table "melts" array but not matrix (2D).

# 2D matrix
> AR <- array(1:12, dim = c(3,4))
> DT <- as.data.table(AR)
> print(DT) # Note: no "value" column, matrix str is preserved!
   V1 V2 V3 V4
1:  1  4  7 10
2:  2  5  8 11
3:  3  6  9 12

# 3D array
> AR <- array(1:24, dim=c(3,4,2))
> DT <- as.data.table(AR)
> print(DT)
    V1 V2 V3 value
 1:  1  1  1     1
 2:  1  1  2    13
 3:  1  2  1     4
 4:  1  2  2    16
[...]
21:  3  3  1     9
22:  3  3  2    21
23:  3  4  1    12
24:  3  4  2    24
    V1 V2 V3 value
> 

So, I have written a function to flexibly convert either matrix or array to data.table in the same fashion. Maybe of help for others.

# Melt matrix or array to data.table 
array2dataTable <- function(x) {
  
  # if is matrix, add third dimension (as.data.table does not melt matrices)
  x.is.matrix <- FALSE
  if (length(dim(x))==2) {
    x.is.matrix <- TRUE
    cat("\nNote: x is a matrix, converting it to array with 3rd dim==3 ..")
    dim(x) <- c(dim(x), 1L)
  }
  # add dimnames
  if (is.null(dimnames(x))) {
    cat("\nNote: Array has no dimnames, using seq of integers ..\n")
    dimnames(x) <- lapply(dim(x), function(X) as.character(seq.int(1, X)))
  }
  DT <- as.data.table(x, na.rm = TRUE)
  if (x.is.matrix==TRUE) DT[,V3:=NULL] # remove third column if converting from 2D matrix
  print(str(DT))
  return(DT)
}

Happy to get feedback if you notice any issue with this. Thanks!



来源:https://stackoverflow.com/questions/62213639/fast-melt-large-2d-matrix-to-3-column-data-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!