What am I missing in python-multiprocessing/multithreading?

问题

I am creating, multiplying and then summing all elements of two big matrices in numpy. I do this some hundred times with two methods, a loop and with the help of the multiprocessing modul (see the snipet below).

def worker_loop(n):
  for i in n:
    mul = np.sum(np.random.normal(size=[i,i])*np.random.normal(size=[i,i]))

def worker(i):
  mul = np.sum(np.random.normal(size=[i,i])*np.random.normal(size=[i,i]))

n = range(100,300)

pool = ThreadPool(2)
pool.map(worker, n)
pool.close()
pool.join()

worker_loop(n)

Measuring the time tells that the loop is faster than multiprocessing. I have also tried the threading module with no success (then I read that this was a bad idea; read more here)

I started this experimenting with multithreading because I need to convert images, labels, bounding boxes, ... into tfrecords. For that I am studying a file from tensorflow/inception (if you want do dwell build_imagenet_data.py, line 453). I believe that here multithreading works that's why they use it.

Saying this, my question can be put as follows,

what am I missing in my code; is it possible to achieve something with small modifications?
does the example from inception work because tensorflow is written in c++ and CUDA?
when is it advisable to use multiprocessing or multithreading with numpy, tensorflow and the like?

回答1:

There is always some overhead (synchronization, data-preparation, data-copies and co.).

But: given a good setup, your matrix-vector and vector-vector operations in numpy are already multithreaded, using BLAS (which is the state of the art standard used everywhere including numpy, matlab and probably tensorflow's cpu-backend; there are different implementations though).

So if BLAS is able to occupy all your cores (easier with big dimensions), you are only seeing the overhead.

And yes, tensorflow in it's core will be implemented by at least one of C/C++/Fortran plus BLAS for it's CPU-backend and some Cuda-libs when targeting GPU. This also means, that the core-algorithms as gradient-calcs and optimization-calcs should never need external parallelization (in 99.9% of all use-cases).

来源：https://stackoverflow.com/questions/46082610/what-am-i-missing-in-python-multiprocessing-multithreading

标签

python-2.7

numpy

tensorflow

python-multiprocessing

python-multithreading