Recode categorical factor with N categories into N binary columns

后端 未结 7 2155
野性不改
野性不改 2020-12-14 09:12

Original data frame:

v1 = sample(letters[1:3], 10, replace=TRUE)
v2 = sample(letters[1:3], 10, replace=TRUE)
df = data.frame(v1,v2)
df
         


        
7条回答
  •  臣服心动
    2020-12-14 09:22

    I recently came across another way. I noticed that when you run any of the contrasts functions with contrasts set to FALSE, it gives you one hot encoding. For example, contr.sum(5, contrasts = FALSE) gives

      1 2 3 4 5
    1 1 0 0 0 0
    2 0 1 0 0 0
    3 0 0 1 0 0
    4 0 0 0 1 0
    5 0 0 0 0 1
    

    To get this behavior for all of your factors, you can create a new contrast function and set it as the default. For example,

    contr.onehot = function (n, contrasts, sparse = FALSE) {
      contr.sum(n = n, contrasts = FALSE, sparse = sparse)
    }
    
    options(contrasts = c("contr.onehot", "contr.onehot"))
    model.matrix(~ . - 1, data = df)
    

    This results in

       v1a v1b v1c v2a v2b v2c
    1    0   0   1   0   0   1
    2    0   1   0   1   0   0
    3    0   0   1   0   1   0
    4    1   0   0   0   1   0
    5    0   1   0   0   1   0
    6    0   1   0   0   0   1
    7    1   0   0   0   1   0
    8    0   1   0   0   1   0
    9    0   1   0   1   0   0
    10   0   0   1   0   0   1
    

提交回复
热议问题