sparseMatrix with numerical and categorical data

半世苍凉 提交于 2019-12-25 06:23:19

问题


I am trying to create a sparse matrix with numerical and categorical data which will be used as an input to cv.glmnet. When only numerical data is involved, I can create a sparseMatrix using the following syntax

sparseMatrix(i=c(1,3,5,2), j=c(1,1,1,2), x=c(1,2,4,3), dims=c(5,2))

For categorical variables, the following approach seems to work:

sparse.model.matrix(~-1+automobile, data.frame(automobile=c("sedan","suv","minivan","truck","sedan")))

My VERY sparse instance has 1,000,000 observations and 10,000 variables. I do not have enough memory to first create the full matrix. The only way I can think of creating a sparseMatrix is to manually handle the categorical variables by creating the columns and converting the data in (i,j,x) format. I am hoping that somebody can suggest a better approach.


回答1:


This may or may not work, but you could try creating the model matrices for each variable separately and then cBinding them together.

do.call(cBind,
        sapply(names(df), function(x) sparse.model.matrix(~., df[x])[, -1, drop=FALSE]))

Note that you probably want to create the intercept column and then remove it, rather than specifying -1 in the formula as you've done above. The latter will remove one level for your first factor, but keep all the levels for the others, so it depends on the ordering of the variables.




回答2:


Sparse matrices have the same capacity as dense matrices for assignment to positions using a two -column matrix as a single argument to "[":

require(Matrix)
M <- Matrix(0, 10, 10)
dfrm <- data.frame(rows=sample(1:10,5), cols=sample(1:10,5), vals=rnorm(5))
dfrm
#---------
  rows cols       vals
1    3    9 -0.1419332
2    4    3  1.4806194
3    6    7 -0.5653500
4    5    1 -1.0127539
5    1    2 -0.5047298
#--------

M[ with( dfrm, cbind(rows,cols) ) ] <- dfrm$vals
M
#---------------

M
10 x 10 sparse Matrix of class "dgCMatrix"

 [1,]  .        -0.5047298 .        . . .  .       .  .         .
 [2,]  .         .         .        . . .  .       .  .         .
 [3,]  .         .         .        . . .  .       . -0.1419332 .
 [4,]  .         .         1.480619 . . .  .       .  .         .
 [5,] -1.012754  .         .        . . .  .       .  .         .
 [6,]  .         .         .        . . . -0.56535 .  .         .
 [7,]  .         .         .        . . .  .       .  .         .
 [8,]  .         .         .        . . .  .       .  .         .
 [9,]  .         .         .        . . .  .       .  .         .
[10,]  .         .         .        . . .  .       .  .         .


来源:https://stackoverflow.com/questions/29479198/sparsematrix-with-numerical-and-categorical-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!