I have a fun challenge: I\'m trying to construct a a binary matrix from an integer vector. The binary matrix should contain as many rows as the length of vector, and as many
You can, of course, also just use table
:
> table(sequence(length(playv)), playv)
playv
0 1 2 3 4 5
1 0 1 0 0 0 0
2 0 0 1 0 0 0
3 0 0 0 1 0 0
4 0 0 0 0 0 1
5 0 1 0 0 0 0
6 0 0 0 0 0 1
7 0 0 0 0 0 1
8 0 0 0 1 0 0
9 0 0 0 1 0 0
10 1 0 0 0 0 0
11 0 1 0 0 0 0
12 0 1 0 0 0 0
13 0 0 0 0 1 0
14 0 0 1 0 0 0
15 0 0 0 0 1 0
16 0 0 1 0 0 0
17 0 0 0 0 1 0
18 0 0 0 0 0 1
19 0 0 1 0 0 0
20 0 0 0 0 1 0
If speed is a concern, I would suggest a manual approach. First, identify the unique values in your vector. Second, create an empty matrix to fill in. Third, use matrix indexing to identify the positions that should be filled in as 1.
Like this:
f3 <- function(vec) {
U <- sort(unique(vec))
M <- matrix(0, nrow = length(vec),
ncol = length(U),
dimnames = list(NULL, U))
M[cbind(seq_len(length(vec)), match(vec, U))] <- 1L
M
}
Usage would be f3(playv)
.
Adding that into the benchmarks, we get:
library(microbenchmark)
microbenchmark(f1(v), f2(v), f3(v), times = 10)
# Unit: milliseconds
# expr min lq median uq max neval
# f1(v) 2104.4808 3151.4308 3314.8173 3344.6696 4023.5246 10
# f2(v) 3956.5678 4782.7863 5994.4448 6320.1901 6646.0405 10
# f3(v) 486.4406 574.1133 746.9112 927.3407 987.9121 10
set.seed(1)
playv <- sample(0:5,20,replace=TRUE)
playv <- as.character(playv)
results <- model.matrix(~playv-1)
The columns in result
you may rename.
I like the solution provided by Ananda Mahto and compared it to model.matrix
. Here is a code
library(microbenchmark)
set.seed(1)
v <- sample(1:10,1e6,replace=TRUE)
f1 <- function(vec) {
vec <- as.character(vec)
model.matrix(~vec-1)
}
f2 <- function(vec) {
table(sequence(length(vec)), vec)
}
microbenchmark(f1(v), f2(v), times=10)
model.matrix
was a little bit faster then table
Unit: seconds
expr min lq median uq max neval
f1(v) 2.890084 3.147535 3.296186 3.377536 3.667843 10
f2(v) 4.824832 5.625541 5.757534 5.918329 5.966332 10