why isn't numpy.mean multithreaded?

后端未结

关注

 2  1400

野性不改 2020-11-28 10:16

I\'ve been looking for ways to easily multithread some of my simple analysis code since I had noticed numpy it was only using one core, despite the fact that it is supposed

2条回答

一生所求 (楼主)

2020-11-28 10:41

Basically, because the BLAS library has an optimized dot product that they can easily call for dot that is inherently parallel. They admit they could extend numpy to parallelize other operations, but opted not to go that route. However, they give several tips on how to parallelize your numpy code (basically to divide work among N cores (e.g., N=4), split your array into N sub-arrays and send jobs for each sub-array to its own thread and then combine your results).

See http://wiki.scipy.org/ParallelProgramming :

Use parallel primitives

One of the great strengths of numpy is that you can express array operations very cleanly. For example to compute the product of the matrix A and the matrix B, you just do:

>>> C = numpy.dot(A,B)

Not only is this simple and clear to read and write, since numpy knows you want to do a matrix dot product it can use an optimized implementation obtained as part of "BLAS" (the Basic Linear Algebra Subroutines). This will normally be a library carefully tuned to run as fast as possible on your hardware by taking advantage of cache memory and assembler implementation. But many architectures now have a BLAS that also takes advantage of a multicore machine. If your numpy/scipy is compiled using one of these, then dot() will be computed in parallel (if this is faster) without you doing anything. Similarly for other matrix operations, like inversion, singular value decomposition, determinant, and so on. For example, the open source library ATLAS allows compile time selection of the level of parallelism (number of threads). The proprietary MKL library from Intel offers the possibility to chose the level of parallelism at runtime. There is also the GOTO library that allow run-time selection of the level of parallelism. This is a commercial product but the source code is distributed free for academic use.

Finally, scipy/numpy does not parallelize operations like

>>> A = B + C

>>> A = numpy.sin(B)

>>> A = scipy.stats.norm.isf(B)

These operations run sequentially, taking no advantage of multicore machines (but see below). In principle, this could be changed without too much work. OpenMP is an extension to the C language which allows compilers to produce parallelizing code for appropriately-annotated loops (and other things). If someone sat down and annotated a few core loops in numpy (and possibly in scipy), and if one then compiled numpy/scipy with OpenMP turned on, all three of the above would automatically be run in parallel. Of course, in reality one would want to have some runtime control - for example, one might want to turn off automatic parallelization if one were planning to run several jobs on the same multiprocessor machine.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

why isn't numpy.mean multithreaded?

Use parallel primitives