Increasing n_jobs has no effect on GridSearchCV

送分小仙女□ 提交于 2019-11-30 22:20:24

Here are some reasons which might be a cause of this behaviour

  • With increasing no. of threads, there is an apparent overhead incurred for intializing and releasing each thread. I ran your code on my i7 7700HQ, I saw the following behaviour with each inceasing n_job
    • when n_job=1 and n_job=2 the time per thread(Time per model evaluation by GridSearchCV to fully train the model and test it) was 2.9s (overall time ~2 mins)
    • when n_job=3, time was 3.4s (overall time 1.4 mins)
    • when n_job=4, time was 3.8s (overall time 58 secs)
    • when n_job=5, time was 4.2s (overall time 51 secs)
    • when n_job=6, time was 4.2s (overall time ~49 secs)
    • when n_job=7, time was 4.2s (overall time ~49 secs)
    • when n_job=8, time was 4.2s (overall time ~49 secs)
  • Now as you can see, time per thread increased but overall time seem to decrease (although beyond n_job=4 the different was not exactly linear) and remained constained withn_jobs>=6` This is due to the fact that there is a cost incurred with initializing and releaseing threads. See this github issue and this issue.

  • Also, there might be other bottlenecks like data being to large to be broadcasted to all threads at the same time, thread pre-emption over RAM (or other resouces,etc.), how data is pushed into each thread, etc.

  • I suggest you to read about Ahmdal's Law which states that there is a theoretical bound on the speedup that can be achieved through parallelization which is given by the formula

    Image Source : Ahmdal's Law : Wikipedia

  • Finally, it might be due to the data size and the complexity of the model you use for training as well.

Here is a blog post explaining the same issue regarding multithreading.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!