I am fairly new to python. I am using the multiprocessing module for reading lines of text on stdin, converting them in some way and writing them into a database. Here\'s a
The apply_async and map_async functions are designed not to block the main process. In order to do so, the Pool maintains an internal Queue which size is unfortunately impossible to change.
The way the problem can be solved is by using a Semaphore initialized with the size you want the queue to be. You acquire and release the semaphore before feeding the pool and after a worker has completed the task.
Here's an example working with Python 2.6 or greater.
from threading import Semaphore
from multiprocessing import Pool
def task_wrapper(f):
"""Python2 does not allow a callback for method raising exceptions,
this wrapper ensures the code run into the worker will be exception free.
"""
try:
return f()
except:
return None
class TaskManager(object):
def __init__(self, processes, queue_size):
self.pool = Pool(processes=processes)
self.workers = Semaphore(processes + queue_size)
def new_task(self, f):
"""Start a new task, blocks if queue is full."""
self.workers.acquire()
self.pool.apply_async(task_wrapper, args=(f, ), callback=self.task_done))
def task_done(self):
"""Called once task is done, releases the queue is blocked."""
self.workers.release()
Another example using concurrent.futures pools implementation.