Subprocess completes but still doesn't terminate, causing deadlock

大憨熊 提交于 2019-12-22 05:35:32

问题


Ok, since there are currently no answer's I don't feel too bad doing this. While I'm still interested in what is actually happening behind the scenes to cause this problem, my most urgent questions are those specified in update 2. Those being,

What are the differences between a JoinableQueue and a Manager().Queue() (and when should you use one over the other?). And importantly, is it safe to replace one for the other, in this example?


In the following code, I have a simple process pool. Each process is passed the process queue (pq) to pull data to be processed from, and a return-value queue (rq) to pass the returned values of the processing back to the main thread. If I don't append to the return-value queue it works, but as soon as I do, for some reason the processes are blocked from stopping. In both cases the processes run methods return, so it's not put on the return-queue blocking, but in the second case the processes themselves do not terminate, so the program deadlocks when I join on the processes. Why would this be?

Updates:

  1. It seems to have something to with the number of items in the queue.

    On my machine at least, I can have up to 6570 items in the queue and it actually works, but any more than this and it deadlocks.

  2. It seems to work with Manager().Queue().

    Whether it's a limitation of JoinableQueue or just me misunderstanding the differences between the two objects, I've found that if I replace the return queue with a Manager().Queue(), it works as expected. What are the differences between them, and when should you use one over the other?

  3. The error does not occur if I'm consuming from rq

    Oop. There was an answer here for a moment, and as I was commenting on it, it disappeared. Anyway one of the things it said was questioning whether, if I add a consumer this error still occurs. I have tried this, and the answer is, no it doesn't.

    The other thing it mentioned was this quote from the multiprocessing docs as a possible key to the problem. Referring to JoinableQueue's, it says:

    ... the semaphore used to count the number of unfinished tasks may eventually overflow raising an exception.


import multiprocessing

class _ProcSTOP:
    pass

class Proc(multiprocessing.Process):

    def __init__(self, pq, rq):
        self._pq = pq
        self._rq = rq
        super().__init__()
        print('++', self.name)

    def run(self):
        dat = self._pq.get()

        while not dat is _ProcSTOP:
#            self._rq.put(dat)        # uncomment me for deadlock
            self._pq.task_done()
            dat = self._pq.get()

        self._pq.task_done() 
        print('==', self.name)

    def __del__(self):
        print('--', self.name)

if __name__ == '__main__':

    pq = multiprocessing.JoinableQueue()
    rq = multiprocessing.JoinableQueue()
    pool = []

    for i in range(4):
        p = Proc(pq, rq) 
        p.start()
        pool.append(p)

    for i in range(10000):
        pq.put(i)

    pq.join()

    for i in range(4):
       pq.put(_ProcSTOP)

    pq.join()

    while len(pool) > 0:
        print('??', pool)
        pool.pop().join()    # hangs here (if using rq)

    print('** complete')

Sample output, not using return-queue:

++ Proc-1
++ Proc-2
++ Proc-3
++ Proc-4
== Proc-4
== Proc-3
== Proc-1
?? [<Proc(Proc-1, started)>, <Proc(Proc-2, started)>, <Proc(Proc-3, started)>, <Proc(Proc-4, started)>]
== Proc-2
?? [<Proc(Proc-1, stopped)>, <Proc(Proc-2, started)>, <Proc(Proc-3, stopped)>]
-- Proc-3
?? [<Proc(Proc-1, stopped)>, <Proc(Proc-2, started)>]
-- Proc-2
?? [<Proc(Proc-1, stopped)>]
-- Proc-1
** complete
-- Proc-4

Sample output, using return queue:

++ Proc-1
++ Proc-2
++ Proc-3
++ Proc-4
== Proc-2
== Proc-4
== Proc-1
?? [<Proc(Proc-1, started)>, <Proc(Proc-2, started)>, <Proc(Proc-3, started)>, <Proc(Proc-4, started)>]
== Proc-3
# here it hangs

回答1:


From the documentation:

Warning

As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread()), then that process will not terminate until all buffered items have been flushed to the pipe.

This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.

Note that a queue created using a manager does not have this issue. See Programming guidelines.

So the JoinableQueue() uses a pipe and will wait until it can flush all data before closing.

On the other hand a Manager.Queue() object uses a completely different approach. Managers are running a separate process that receive all data immediately (and store it in its memory).

Managers provide a way to create data which can be shared between different processes. A manager object controls a server process which manages shared objects. Other processes can access the shared objects by using proxies.

...

Queue([maxsize]) Create a shared Queue.Queue object and return a proxy for it.



来源:https://stackoverflow.com/questions/8026050/subprocess-completes-but-still-doesnt-terminate-causing-deadlock

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!