python joblib Parallel on Windows not working even “if __name__ == '__main__':” is added

别等时光非礼了梦想. 提交于 2019-12-20 23:28:42

问题


I'm running parallel processing in Python on Windows. Here's my code:

from joblib import Parallel, delayed

def f(x): 
    return sqrt(x)

if __name__ == '__main__':
    a = Parallel(n_jobs=2)(delayed(f)(i) for i in range(10))

Here's the error message:

Process PoolWorker-2:  
Process PoolWorker-1:  
Traceback (most recent call last):    
File "C:\Users\yoyo__000.BIGBLACK\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.4.3105.win-x86_64\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()   
File "C:\Users\yoyo__000.BIGBLACK\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.4.3105.win-x86_64\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)   
File "C:\Users\yoyo__000.BIGBLACK\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.4.3105.win-x86_64\lib\multiprocessing\pool.py", line 102, in worker
task = get()   
File "C:\Users\yoyo__000.BIGBLACK\AppData\Local\Enthought\Canopy\User\lib\site-packages\joblib\pool.py", line 363, in get
return recv()  
AttributeError: 'module' object has no attribute 'f'

回答1:


According to this site the problem is Windows specific:

Yes: under linux we are forking, thus their is no need to pickle the function, and it works fine. Under windows, the function needs to be pickleable, ie it needs to be imported from another file. This is actually good practice: making modules pushes for reuse.

I've tried your code and it works flawlessly under Linux. Under Windows it runs OK if it is run from a script, like python script_with_your_code.py. But it fails when ran in an interactive python session. It worked for me when I saved the f function in separate module and imported it into my interactive session.

NOT WORKING:
Interactive session:

>>> from math import sqrt
>>> from joblib import Parallel, delayed

>>> def f(x):
...     return sqrt(x)

>>> if __name__ == '__main__':
...     a = Parallel(n_jobs=2)(delayed(f)(i) for i in range(10))
...
Process PoolWorker-1:
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
    self.run()
  File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Python27\lib\multiprocessing\pool.py", line 102, in worker
    task = get()
  File "C:\Python27\lib\site-packages\joblib\pool.py", line 359, in get
    return recv()
AttributeError: 'module' object has no attribute 'f'


WORKING:
fun.py

from math import sqrt

def f(x):
    return sqrt(x)

Interactive session:

>>> from joblib import Parallel, delayed
>>> from fun import f

>>> if __name__ == '__main__':
...     a = Parallel(n_jobs=2)(delayed(f)(i) for i in range(10))
...
>>> a
[0.0, 1.0, 1.4142135623730951, 1.7320508075688772, 2.0, 2.23606797749979, 2.449489742783178, 2.6457513110645907, 2.8284271247461903, 3.0]


来源:https://stackoverflow.com/questions/35452694/python-joblib-parallel-on-windows-not-working-even-if-name-main

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!