Running CrossValidationCV in parallel

被刻印的时光 ゝ 提交于 2019-12-22 14:51:22

问题


When I run a GridsearchCV() and a RandomizedsearchCV() methods in parallel ( having n_jobs>1 or n_jobs=-1 options set )
it shows this message:

ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using "if name == 'main'". Please see the joblib documentation on Parallel for more information" I put the code in a class in .py file and call it using if_name_=='main in other .py file but it still shows this message

It works good when n_jobs=1

import platform; print(platform.platform())
Windows-10-10.0.10586-SP0
import numpy; print("NumPy", numpy.__version__)

NumPy 1.13.1

import scipy; print("SciPy", scipy.__version__)

SciPy 0.19.1

 import sklearn; print("Scikit-Learn", sklearn.__version__)

Scikit-Learn 0.19.0


UPDATE

I tried this code but it still gives me the same error

import numpy as np
from sklearn.model_selection import RandomizedSearchCV
from sklearn.tree import DecisionTreeClassifier

class Test():
   def __init__(self):
          attributes = [..]
          dataset = pd.read_csv("..")
          X=dataset[[..]] 
          Y=dataset[...]
          model=DecisionTreeClassifier()
          model = RandomizedSearchCV(....)
          model.fit(X, Y)          
if __name__ == '__main__':
   Test()

回答1:


joblib is know for this behaviour and rather explicit in documenting:

Warning

Under Windows, it is important to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.Parallel. In other words, you should be writing code like this:

import ....

def function1(...):
    ...

def function2(...):
    ...

...
if __name__ == '__main__':
    # do stuff with imports and functions defined about
    ...

No code should run outside of the “if __name__ == ‘__main__’” blocks, only imports and definitions.

So, refactor your code so as to meet this well-defined requirement and your code will start to benefit from the joblib-tools powers.




回答2:


I imagine this won't be the most useful answer, but you could always parallelize the process manually. https://docs.python.org/2/library/multiprocessing.html



来源:https://stackoverflow.com/questions/48631907/running-crossvalidationcv-in-parallel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!