Original data frame:
v1 = sample(letters[1:3], 10, replace=TRUE)
v2 = sample(letters[1:3], 10, replace=TRUE)
df = data.frame(v1,v2)
df
I recently came across another way. I noticed that when you run any of the contrasts functions with contrasts set to FALSE, it gives you one hot encoding. For example, contr.sum(5, contrasts = FALSE) gives
1 2 3 4 5
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 1 0
5 0 0 0 0 1
To get this behavior for all of your factors, you can create a new contrast function and set it as the default. For example,
contr.onehot = function (n, contrasts, sparse = FALSE) {
contr.sum(n = n, contrasts = FALSE, sparse = sparse)
}
options(contrasts = c("contr.onehot", "contr.onehot"))
model.matrix(~ . - 1, data = df)
This results in
v1a v1b v1c v2a v2b v2c
1 0 0 1 0 0 1
2 0 1 0 1 0 0
3 0 0 1 0 1 0
4 1 0 0 0 1 0
5 0 1 0 0 1 0
6 0 1 0 0 0 1
7 1 0 0 0 1 0
8 0 1 0 0 1 0
9 0 1 0 1 0 0
10 0 0 1 0 0 1