R - convert BIG table into matrix by column names

荒凉一梦 提交于 2019-12-03 15:47:24

I would use the sparseMatrix function from the Matrix package. The typical usage is sparseMatrix(i, j, x) where i, j, and x are three vectors of same length: respectively, the row indices, col indices, and values of the non-zero elements in the matrix. Here is an example where I have tried to match variable names and dimensions to your specifications:

num.pages <- 220000
num.recos <- 230000
N         <- 1500000

df <- data.frame(page_id = sample.int(num.pages, N, replace=TRUE),
                 reco    = sample.int(num.recos, N, replace=TRUE),
                 value   = runif(N))
head(df)
#   page_id   reco     value
# 1   33688  48648 0.3141030
# 2   78750 188489 0.5591290
# 3  158870  13157 0.2249552
# 4   38492  56856 0.1664589
# 5   70338 138006 0.7575681
# 6  160827  68844 0.8375410

library("Matrix")
mat <- sparseMatrix(i = df$page_id,
                    j = df$reco,
                    x = df$value,
                    dims = c(num.pages, num.recos))

The simplest way to do this in base R is with matrix indexing, like this:

# make up data
num.pages <- 100
num.recos <- 100
N <- 300
set.seed(5)
df <- data.frame(page_id = sample.int(num.pages, N, replace=TRUE),
                 reco    = sample.int(num.recos, N, replace=TRUE),
                 value   = runif(N))

# now get the desired matrix
out <- matrix(nrow=num.pages, ncol=num.recos)
out[cbind(df$page_id, df$reco)] <- df$value

However, in this case, your resulting matrix would be 220k*220k, which would take more memory than you have, so you need to use a package specifically for sparse matrices, as @flodel describes.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!