keras + scikit-learn wrapper, appears to hang when GridSearchCV with n_jobs >1

♀尐吖头ヾ 提交于 2019-12-06 04:54:42

问题


UPDATE: I have to re-write this question as after some investigation I realise that this is a different problem.

Context: running keras in a gridsearch setting using the kerasclassifier wrapper with scikit learn. Sys: Ubuntu 16.04, libraries: anaconda distribution 5.1, keras 2.0.9, scikitlearn 0.19.1, tensorflow 1.3.0 or theano 0.9.0, using CPUs only.

Code: I simply used the code here for testing: https://machinelearningmastery.com/use-keras-deep-learning-models-scikit-learn-python/, the second example 'Grid Search Deep Learning Model Parameters'. Pay attention to line 35, which reads:

grid = GridSearchCV(estimator=model, param_grid=param_grid)

Symptoms: When grid search uses more than 1 jobs (means cpus?), e.g.,, setting 'n_jobs' on the above line A to '2', line below:

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2)

will cause the code to hang indefinitely, either with tensorflow or theano, and there is no cpu usage (see attached screenshot, where 5 python processes were created but none is using cpu).

By debugging, it appears to be the following line with 'sklearn.model_selection._search' that causes problems:

line 648: for parameters, (train, test) in product(candidate_params,
                                               cv.split(X, y, groups)))

, on which the program hangs and cannot continue.

I would really appreciate some insights as to what this means and why this could happen.

Thanks in advance


回答1:


Are you using a GPU? If so, you can't have multiple threads running each variation of the params because they won't be able to share the GPU.

Here's a full example on how to use keras, sklearn wrappers in a Pipeline with GridsearchCV: Pipeline with a Keras Model

If you really want to have multiple jobs in the GridSearchCV, you can try to limit the GPU fraction used by each job (e.g. if each job only allocates 0.5 of the available GPU memory, you can run 2 jobs simultaneously)

See these issues:

  • Limit the resource usage for tensorflow backend

  • GPU memory fraction does not work in keras 2.0.9 but it works in 2.0.8




回答2:


I know this is a late answer, but I dealt with this problem too and it really slowed me down not being able to run what is essentially trivially-parallelizable code. The issue is indeed with the tensorflow session. If a session in created in the parent process before GridSearchCV.fit(), it will hang!

The solution for me was to keep all session/graph creation code restricted to the KerasClassifer class and the model creation function i passed to it.

Also what Felipe said about the memory is true, you will want to restrict the memory usage of TF in either the model creation function or a subclass of KerasClassifier.

Related info:

  • Session hang issue with python multiprocessing
  • Keras + Tensorflow and Multiprocessing in Python


来源:https://stackoverflow.com/questions/47527915/keras-scikit-learn-wrapper-appears-to-hang-when-gridsearchcv-with-n-jobs-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!