Recode categorical factor with N categories into N binary columns

后端 未结 7 2123
野性不改
野性不改 2020-12-14 09:12

Original data frame:

v1 = sample(letters[1:3], 10, replace=TRUE)
v2 = sample(letters[1:3], 10, replace=TRUE)
df = data.frame(v1,v2)
df
         


        
7条回答
  •  南方客
    南方客 (楼主)
    2020-12-14 09:16

    Just seen a closed question directed to here, and nobody has mentioned using the dummies package yet:

    You can recode your variables using the dummy.data.frame() function which is built on top of model.matrix() but has easier syntax, some good options and will return a dataframe:

    > dummy.data.frame(df, sep="_")
       v1_a v1_b v1_c v2_a v2_b v2_c
    1     0    1    0    0    0    1
    2     1    0    0    1    0    0
    3     0    0    1    0    0    1
    4     0    1    0    1    0    0
    5     0    0    1    0    0    1
    6     0    0    1    0    1    0
    7     1    0    0    1    0    0
    8     1    0    0    0    1    0
    9     1    0    0    0    0    1
    10    1    0    0    0    1    0
    

    Some nice aspects of this function is you can easily specify delimeter for the new names (sep=), omit non-encoded variables (all=F) and comes with its own option dummy.classes that allows you to specify which classes of column should be encoded.

    You can also just use the dummy() function to apply this to just one column.

提交回复
热议问题