Importing scipy breaks multiprocessing support in Python

后端 未结 1 518
囚心锁ツ
囚心锁ツ 2020-12-08 11:47

I am running into a bizarre problem that I can\'t explain. I\'m hoping someone out there can help please!

I\'m running Python 2.7.3 and Scipy v0.14.0 and am trying

相关标签:
1条回答
  • 2020-12-08 12:21

    After much digging around and posting an issue on the Scipy GitHub site, I've found a solution.

    Before I start, this is documented very well here - I'll just give an overview.

    This problem is not related to the version of Scipy, or Numpy that I was using. It originates in the system BLAS libraries that Numpy and Scipy use for various linear algebra routines. You can tell which libraries Numpy is linked to by running

    python -c 'import numpy; numpy.show_config()'

    If you are using OpenBLAS in Linux, you may find that the CPU affinity is set to 1, meaning that once these algorithms are imported in Python (via Numpy/Scipy), you can access at most one core of the CPU. To test this, within a Python terminal run

    import os
    os.system('taskset -p %s' %os.getpid())
    

    If the CPU affinity is returned as f, of ff, you can access multiple cores. In my case it would start like that, but upon importing numpy or scipy.any_module, it would switch to 1, hence my problem.

    I've found two solutions:

    Change CPU affinity

    You can manually set the CPU affinity of the master process at the top of the main function so that the code looks like this:

    import multiprocessing
    import numpy as np
    import math
    import time
    import os
    
    def compute_something(t):
        a = 0.
        for i in range(10000000):
            a = math.sqrt(t)
        return a
    
    if __name__ == '__main__':
    
        pool_size = multiprocessing.cpu_count()
        os.system('taskset -cp 0-%d %s' % (pool_size, os.getpid()))
    
        print "Pool size:", pool_size
        pool = multiprocessing.Pool(processes=pool_size)
    
        inputs = range(10)
    
        tic = time.time()
        builtin_outputs = map(compute_something, inputs)
        print 'Built-in:', time.time() - tic
    
        tic = time.time()
        pool_outputs = pool.map(compute_something, inputs)
        print 'Pool    :', time.time() - tic
    

    Note that selecting a value higher than the number of cores for taskset doesn't seem to matter - it just uses the maximum possible number.

    Switch BLAS libraries

    Solution documented at the site linked above. Basically: install libatlas and run update-alternatives to point numpy to ATLAS rather than OpenBLAS.

    0 讨论(0)
提交回复
热议问题