问题
When I run a GridsearchCV() and a RandomizedsearchCV() methods in parallel ( having n_jobs>1 or n_jobs=-1 options set )
it shows this message:
ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using "if name == 'main'". Please see the joblib documentation on Parallel for more information" I put the code in a class in .py file and call it using if_name_=='main in other .py file but it still shows this message
It works good when n_jobs=1
import platform; print(platform.platform())
Windows-10-10.0.10586-SP0
import numpy; print("NumPy", numpy.__version__)
NumPy 1.13.1
import scipy; print("SciPy", scipy.__version__)
SciPy 0.19.1
import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.19.0
UPDATE
I tried this code but it still gives me the same error
import numpy as np
from sklearn.model_selection import RandomizedSearchCV
from sklearn.tree import DecisionTreeClassifier
class Test():
def __init__(self):
attributes = [..]
dataset = pd.read_csv("..")
X=dataset[[..]]
Y=dataset[...]
model=DecisionTreeClassifier()
model = RandomizedSearchCV(....)
model.fit(X, Y)
if __name__ == '__main__':
Test()
回答1:
joblib is know for this behaviour and rather explicit in documenting:
Warning
Under Windows, it is important to protect the main loop of code to avoid recursive spawning of
subprocesseswhen usingjoblib.Parallel. In other words, you should be writing code like this:
import ....
def function1(...):
...
def function2(...):
...
...
if __name__ == '__main__':
# do stuff with imports and functions defined about
...
No code should run outside of the
“if __name__ == ‘__main__’”blocks, only imports and definitions.
So, refactor your code so as to meet this well-defined requirement and your code will start to benefit from the joblib-tools powers.
回答2:
I imagine this won't be the most useful answer, but you could always parallelize the process manually. https://docs.python.org/2/library/multiprocessing.html
来源:https://stackoverflow.com/questions/48631907/running-crossvalidationcv-in-parallel