How to one hot encode several categorical variables in R

前端 未结 5 932
再見小時候
再見小時候 2020-12-01 06:55

I\'m working on a prediction problem and I\'m building a decision tree in R, I have several categorical variables and I\'d like to one-hot encode them consistently in my tra

5条回答
  •  长情又很酷
    2020-12-01 07:26

    I recommend using the dummyVars function in the caret package:

    customers <- data.frame(
      id=c(10, 20, 30, 40, 50),
      gender=c('male', 'female', 'female', 'male', 'female'),
      mood=c('happy', 'sad', 'happy', 'sad','happy'),
      outcome=c(1, 1, 0, 0, 0))
    customers
    id gender  mood outcome
    1 10   male happy       1
    2 20 female   sad       1
    3 30 female happy       0
    4 40   male   sad       0
    5 50 female happy       0
    
    
    # dummify the data
    dmy <- dummyVars(" ~ .", data = customers)
    trsf <- data.frame(predict(dmy, newdata = customers))
    trsf
    id gender.female gender.male mood.happy mood.sad outcome
    1 10             0           1          1        0       1
    2 20             1           0          0        1       1
    3 30             1           0          1        0       0
    4 40             0           1          0        1       0
    5 50             1           0          1        0       0
    

    example source

    You apply the same procedure to both the training and validation sets.

提交回复
热议问题