问题
I have to run several regression models for a project. The workflow looks something like this:
glm(y ~ variables, data=data1)
glm(y ~ variables, data=data2)
glm(y ~ variables, data=data3)
glm(y ~ variables, data=data4)
I then run a different model on the same data:
lm(z ~ other_variables, data=data1)
lm(z ~ other_variables, data=data2)
lm(z ~ other_variables, data=data3)
lm(z ~ other_variables, data=data4)
Running these models take something like 8 hours, so I want to parallelize this operation. I have 4 cores, so in theory, this process could be sped up quite a lot. The problem is that data1 is much larger than the rest of the datasets, which means that 3 of my cores will have to sit idle for hours until the next task can start.
Most of the parallelization methods I've come across deals with applying a single function to several arguments, but is it possible to parallelize using several different functions at once?
At first the workload should be like this:
Core1: glm(y ~ variables, data=data1)
Core2: glm(y ~ variables, data=data2)
Core3: glm(y ~ variables, data=data3)
Core4: glm(y ~ variables, data=data4)
But when cores 2-4 are have finished computing the model, I want them to start working on the next regression model, even while core 1 continues to calculate the glm model for data1:
Core1: glm(y ~ variables, data=data1)
Core2: lm(z ~ other_variables, data=data1)
Core3: lm(z ~ other_variables, data=data2)
Core4: lm(z ~ other_variables, data=data3)
Is there a model/function which would allow me to do this?
I've so far run with the parallel package and the paraLapply function, but I run into the aforementioned problems.
来源:https://stackoverflow.com/questions/62324744/can-i-run-several-different-functions-in-parallel-at-once