问题
I have a large matrix num [1:62410, 1:48010]
I want this in a long format data.table
e.g.
Var1 Var2 value
1: 1 1 -4227.786
2: 2 1 -4211.908
3: 3 1 -4197.034
4: 4 1 -4183.645
5: 5 1 -4171.692
6: 6 1 -4161.634
minimal example
m = matrix(1:5, nrow = 1000, ncol = 1000)
x = data.table(reshape2::melt(m))
ideally I'd want the columns names x, y and value at the same time.
Previously I've been using data.table(melt(mymatrix))
.
But judging by the warnings that reshape2::melt
is deprecated, this is probably not optimal in terms of speed, what would be the best "data.table" way of solving this?
the following do not answer my question: Fast melted data.table operations Proper/fastest way to reshape a data.table
Other answers refer to the deprecated reshape2
package
回答1:
Here's an example:
# example matrix
m = matrix(1:12, nrow = 4)
# load data table
library(data.table)
We can extract the data, row and column info directly and it should be pretty fast:
dt = data.table(
row = rep(seq_len(nrow(m)), ncol(m)),
col = rep(seq_len(ncol(m)), each = nrow(m)),
value = c(m)
)
The result is:
row col value
1: 1 1 1
2: 2 1 2
3: 3 1 3
4: 4 1 4
5: 1 2 5
6: 2 2 6
7: 3 2 7
8: 4 2 8
9: 1 3 9
10: 2 3 10
11: 3 3 11
12: 4 3 12
回答2:
There is a as.data.table
method for array
which will do the trick for you.
dim(m) <- c(dim(m), 1L)
as.data.table(m)
In future, when posting questions on SO, please provide minimal example.
I now looked at the source of it and I see it may not be very efficient, because it materializes all NA values, and then removes them.
回答3:
A while ago I've run into the same problem as @BetaScoo8 and asked a similar question (see here). As pointed out by @jangorecki as.data.table
"melts" array but not matrix (2D).
# 2D matrix
> AR <- array(1:12, dim = c(3,4))
> DT <- as.data.table(AR)
> print(DT) # Note: no "value" column, matrix str is preserved!
V1 V2 V3 V4
1: 1 4 7 10
2: 2 5 8 11
3: 3 6 9 12
# 3D array
> AR <- array(1:24, dim=c(3,4,2))
> DT <- as.data.table(AR)
> print(DT)
V1 V2 V3 value
1: 1 1 1 1
2: 1 1 2 13
3: 1 2 1 4
4: 1 2 2 16
[...]
21: 3 3 1 9
22: 3 3 2 21
23: 3 4 1 12
24: 3 4 2 24
V1 V2 V3 value
>
So, I have written a function to flexibly convert either matrix or array to data.table in the same fashion. Maybe of help for others.
# Melt matrix or array to data.table
array2dataTable <- function(x) {
# if is matrix, add third dimension (as.data.table does not melt matrices)
x.is.matrix <- FALSE
if (length(dim(x))==2) {
x.is.matrix <- TRUE
cat("\nNote: x is a matrix, converting it to array with 3rd dim==3 ..")
dim(x) <- c(dim(x), 1L)
}
# add dimnames
if (is.null(dimnames(x))) {
cat("\nNote: Array has no dimnames, using seq of integers ..\n")
dimnames(x) <- lapply(dim(x), function(X) as.character(seq.int(1, X)))
}
DT <- as.data.table(x, na.rm = TRUE)
if (x.is.matrix==TRUE) DT[,V3:=NULL] # remove third column if converting from 2D matrix
print(str(DT))
return(DT)
}
Happy to get feedback if you notice any issue with this. Thanks!
来源:https://stackoverflow.com/questions/62213639/fast-melt-large-2d-matrix-to-3-column-data-table