Memory issue with foreach loop in R on Windows 8 (64-bit) (doParallel package)

前端 未结 3 892
夕颜
夕颜 2021-02-05 22:32

I\'m trying to move from a serial to parallel approach to accomplish some multivariate time series analysis tasks on a large data.table. The table contains data fo

3条回答
  •  猫巷女王i
    2021-02-05 23:09

    Holding everything in memory is one of those (aargh, annoying) things that R programmers have to learn to deal with. It's pretty easy to imagine your code example as either memory-bound or CPU-bound, and you'll need to figure that out before trying to apply workarounds.

    Assuming the memory is being consumed by your dataset (dt_all) and not during the actual model run, it is possible you might be able to release enough memory for the worker processes to parallelize:

    foreach(grp=unique(dt.all$grp), .packages="data.table", .combine="rbind")  %dopar%
    {
        dt.sub = dt.all[grp == grp]
        rm(dt.all)
        gc()
        f_lm(dt.sub, grp)
    }
    

    However, this assumes that your working set (dt.sub) is small enough that you can fit more than one of them in memory at a time. It isn't hard to imagine a problem set too large for that. Also, and this is really annoying, all the workers are going to fire up at one time and kill your machine anyway, so you might need to make them pause for a couple seconds to allow other children to load up and release memory.

    Though desperately stupid and brute-force, I have handled this exact problem by writing the subsets out to disk as individual data files, and then used a batch script to run my computations in parallel.

提交回复
热议问题