R change categorical data to dummy variables

纵然是瞬间 提交于 2019-12-12 04:58:00

问题


I have a multi-variant data frame and want to convert the categorical data inside to dummy variables, I used model.matrix but it does not quite work. Please refer to the example below:

age = c(1:15)                                                          #numeric
sex = c(rep(0,7),rep(1,8)); sex = as.factor(sex)                       #factor
bloodtype = c(rep('A',2),rep('B',8),rep('O',1),rep('AB',4));bloodtype = as.factor(bloodtype)         #factor
bodyweight = c(11:25)                                                  #numeric

wholedata = data.frame(cbind(age,sex,bloodtype,bodyweight))

model.matrix(~.,data=wholedata)[,-1]

The reason I did not use model.matrix(~age+sex+bloodtype+bodyweight)[,-1] is because this is just a toy example. In the real data, I could have tens or hundreds more columns. I do not think type all variable names here is a good idea.

Thanks


回答1:


It's the cbind that's messing things up. It converts your factors to numerics which are then not interpreted correctly by model.matrix.

If you just do wholedata = data.frame(age,sex,bloodtype,bodyweight) there should be no problem.

cbind returns a matrix and in a matrix everything must have the same type. The result in this example is that the factors are converted to integers (which is the underlying representation of a factor in the first place) and then the type of the matrix is integer.

Try

wholedata = cbind(age,sex,bloodtype,bodyweight)
is.integer(wholedata) ## TRUE
is.factor(wholedata[,2]) ## FALSE

wholedata = data.frame(age,sex,bloodtype,bodyweight)
is.integer(wholedata) ## FALSE
is.factor(wholedata[,2]) ## TRUE


来源:https://stackoverflow.com/questions/25412897/r-change-categorical-data-to-dummy-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!