R: Expanding an R factor into dummy columns for every factor level

南笙酒味 提交于 2019-12-20 03:08:13

问题


I have a quite big data frame in R with two columns. I am trying to make out of the Code column (factor type with 858 levels) the dummy variables. The problem is that the R Studio always crashed when I am trying to do that.

> str(d)
'data.frame':   649226 obs. of  2 variables:
 $ User: int  210 210 210 210 269 317 317 317 317 326 ...
 $ Code      : Factor w/ 858 levels "AA02","AA03",..: 164 494 538 626 464 496 435 464 475 163 ... 

The User column is not unique, meaning that there can be several rows with the same User. Doesn't matter if in the end the amount of rows remains the same or the rows with the same User are merged into one row having several columns non-empty with the count of Codes.

I found couple of solutions that work for a smaller dataset, but not for mine.

  • Tried using model.matrix, but the R Studio just crashes

    m <- model.matrix( ~ Code, data = d)
    

    Found here Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level

  • Tried for cycle with ifelse, but the code run for 4 hours and then I noticed that the R Studio crashed.

    for (t in unique(d$Code)) {
      d[paste("Code", t, sep = "")] <- ifelse(d$Code == t, 1, 0)
    }
    

    Found here Create new dummy variable columns from categorical variable

Would be great if you can recommend me some method which is fast and working for such type of data.

Thanks!


回答1:


This worked for me perfectly:

library(reshape2)
m <- acast(data = d, User ~ Code)

The only thing was that it produced NAs, instead of 0s, but this can be easily changed with this:

m[is.na(m)] <- 0


来源:https://stackoverflow.com/questions/22286466/r-expanding-an-r-factor-into-dummy-columns-for-every-factor-level

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!