Suppose you have a data frame with a high number of columns(1000 factors, each with 15 levels). You\'d like to create a dummy variable data set, but since it would be too sp
Thanks for having clarified your question, try this.
Here is sample data with two columns that have three and two levels respectively:
set.seed(123)
n <- 6
df <- data.frame(x = sample(c("A", "B", "C"), n, TRUE),
y = sample(c("D", "E"), n, TRUE))
# x y
# 1 A E
# 2 C E
# 3 B E
# 4 C D
# 5 C E
# 6 A D
library(Matrix)
spm <- lapply(df, function(j)sparseMatrix(i = seq_along(j),
j = as.integer(j), x = 1))
do.call(cBind, spm)
# 6 x 5 sparse Matrix of class "dgCMatrix"
#
# [1,] 1 . . . 1
# [2,] . . 1 . 1
# [3,] . 1 . . 1
# [4,] . . 1 1 .
# [5,] . . 1 . 1
# [6,] 1 . . 1 .
Edit: @user20650 pointed out do.call(cBind, ...) was sluggish or failing with large data. So here is a more complex but much faster and efficient approach:
n <- nrow(df)
nlevels <- sapply(df, nlevels)
i <- rep(seq_len(n), ncol(df))
j <- unlist(lapply(df, as.integer)) +
rep(cumsum(c(0, head(nlevels, -1))), each = n)
x <- 1
sparseMatrix(i = i, j = j, x = x)