As a relatively inexperienced user of the data.table package in R, I\'ve been trying to process one text column into a large number of indicator columns (dummy variables), w
Here is an approach using rapply and table.
I'm sure there would be a slightly faster approach than using table here, but it is still slightly faster than the myfunc.Modified from @ricardo;s answer
# a copy with enough column pointers available
dtr <- alloc.col(copy(dt) ,1000L)
rapplyFun <- function(){
ll <- strsplit(dtr[, messy_string], '\\$')
Vals <- rapply(ll, classes = 'character', f= table, how = 'replace')
Names <- unique(rapply(Vals, names))
dtr[, (Names) := 0L]
for(ii in seq_along(Vals)){
for(jj in names(Vals[[ii]])){
set(dtr, i = ii, j = jj, value =Vals[[ii]][jj])
}
}
}
microbenchmark(myFunc.modified(), rapplyFun(),times=5)
Unit: milliseconds
# expr min lq median uq max neval
# myFunc.modified() 395.1719 396.8706 399.3218 400.6353 401.1700 5
# rapplyFun() 308.9103 309.5763 309.9368 310.2971 310.3463 5