parallel regression in R (maybe with snowfall)

混江龙づ霸主 提交于 2020-12-06 15:47:25

问题


I'm trying to run R in parallel to run a regression. I'm trying to use the snowfall library (but am open to any approach). Currently, I'm running the following regression which is taking an extremely long time to run. Can someone show me how to do this?

 sales_day_region_ctgry_lm <- lm(log(sales_out+1)~factor(region_out) 
             + date_vector_out + factor(date_vector_out) +
             factor(category_out) + mean_temp_out)

I've started down the following path:

library(snowfall)
sfInit(parallel = TRUE, cpus=4, type="SOCK")

wrapper <- function() {
return(lm(log(sales_out+1)~factor(region_out) + date_vector_out +
               factor(date_vector_out) + factor(category_out) +   mean_temp_out))
}

output_lm <- sfLapply(*no idea what to do here*,wrapper)
sfStop()
summary(output_lm)

But this approach is riddled with errors.

Thanks!


回答1:


The partools package offers an easy, off-the-shelf implementation of parallelised linear regression via its calm() function. (The "ca" prefix stands for "chunk averaging".)

In your case -- leaving aside @Roland's correct comment about mixing up factor and continuous predictors -- the solution should be as simple as:

library(partools)
#library(parallel) ## loads as dependency

cls <- makeCluster(4) ## Or, however many cores you want/have.

sales_day_region_ctgry_calm <- 
  calm(
    cls, 
    "log(sales_out+1) ~ factor(region_out) + date_vector_out + 
     factor(date_vector_out) + factor(category_out) + mean_temp_out, 
     data=YOUR_DATA_HERE"
    )

Note that the model call is described within quotation marks. Note further that you may need to randomise your data first if it is ordered in any way (e.g. by date.) See the partools vignette for more details.




回答2:


Since you're fitting one big model (as opposed to several small models), and you're using linear regression, a quick-and-easy way to get parallelism is to use a multithreaded BLAS. Something like Microsoft R Open (previously known as Revolution R Open) should do the trick.*

* disclosure: I work for Microsoft/Revolution.



来源:https://stackoverflow.com/questions/35932802/parallel-regression-in-r-maybe-with-snowfall

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!