Split a string column into several dummy variables

前端 未结 6 681
醉梦人生
醉梦人生 2020-12-01 22:13

As a relatively inexperienced user of the data.table package in R, I\'ve been trying to process one text column into a large number of indicator columns (dummy variables), w

6条回答
  •  星月不相逢
    2020-12-01 22:32

    Here is an approach using rapply and table. I'm sure there would be a slightly faster approach than using table here, but it is still slightly faster than the myfunc.Modified from @ricardo;s answer

    # a copy with enough column pointers available
    dtr <- alloc.col(copy(dt)  ,1000L)
    
    rapplyFun <- function(){
    ll <- strsplit(dtr[, messy_string], '\\$')
    Vals <- rapply(ll, classes = 'character', f= table, how = 'replace')
    Names <- unique(rapply(Vals, names))
    
    dtr[, (Names) := 0L]
    for(ii in seq_along(Vals)){
      for(jj in names(Vals[[ii]])){
        set(dtr, i = ii, j = jj, value =Vals[[ii]][jj])
      }
    }
    }
    
    
    microbenchmark(myFunc.modified(), rapplyFun(),times=5)
    Unit: milliseconds
    #             expr      min       lq   median       uq      max neval
    # myFunc.modified() 395.1719 396.8706 399.3218 400.6353 401.1700     5
    # rapplyFun()       308.9103 309.5763 309.9368 310.2971 310.3463     5
    

提交回复
热议问题