Multiprocessing pool 'apply_async' only seems to call function once

Deadly 提交于 2019-12-03 08:30:48
Joshua Taylor

apply_async isn't meant to launch multiple processes; it's just meant to call the function with the arguments in one of the processes of the pool. You'll need to make 10 calls if you want the function to be called 10 times.

First, note the docs on apply() (emphasis added):

apply(func[, args[, kwds]])

Call func with arguments args and keyword arguments kwds. It blocks until the result is ready. Given this blocks, apply_async() is better suited for performing work in parallel. Additionally, func is only executed in one of the workers of the pool.

Now, in the docs for apply_async():

apply_async(func[, args[, kwds[, callback[, error_callback]]]])

A variant of the apply() method which returns a result object.

The difference between the two is just that apply_async returns immediately. You can use map() to call a function multiple times, though if you're calling with the same inputs, then it's a little redudant to create the list of the same argument just to have a sequence of the right length.

However, if you're calling different functions with the same input, then you're really just calling a higher order function, and you could do it with map or map_async() like this:

multiprocessing.map(lambda f: f(1), functions)

except that lambda functions aren't pickleable, so you'd need to use a defined function (see How to let Pool.map take a lambda function). You can actually use the builtin apply() (not the multiprocessing one) (although it's deprecated):

multiprocessing.map(apply,[(f,1) for f in functions])

It's easy enough to write your own, too:

def apply_(f,*args,**kwargs):
  return f(*args,**kwargs)

multiprocessing.map(apply_,[(f,1) for f in functions])

Each time you write pool.apply_async(...) it will delegate that function call to one of the processes that was started in the pool. If you want to call the function in multiple processes, you need to issue multiple pool.apply_async calls.

Note, there also exists a pool.map (and pool.map_async) function which will take a function and an iterable of inputs:

inputs = range(30)
results = pool.map(f, inputs)

These functions will apply the function to each input in the inputs iterable. It attempts to put "batches" into the pool so that the load gets balanced fairly evenly among all the processes in the pool.

Blckknght

If you want to run a single piece of code in ten processes, each of which then exits, a Pool of ten processes is probably not the right thing to use.

Instead, create ten Processes to run the code:

processes = []

for _ in range(10):
    p = multiprocessing.Process(target=f, args=(1,))
    p.start()
    processes.append(p)

for p in processes:
    p.join()

The multiprocessing.Pool class is designed to handle situations where the number of processes and the number of jobs are unrelated. Often the number of processes is selected to be the number of CPU cores you have, while the number of jobs is much larger. Thanks!

Steve Bond

If you aren't committed to Pool for any particular reason, I've written a function around multiprocessing.Process that will probably do the trick for you. It's posted here, but I'd be happy to upload the most recent version to github if you want it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!