how to to terminate process using python's multiprocessing

前端 未结 3 1930
你的背包
你的背包 2020-12-14 02:56

I have some code that needs to run against several other systems that may hang or have problems not under my control. I would like to use python\'s multiprocessing to spawn

相关标签:
3条回答
  • 2020-12-14 03:09

    You might run the child processes as daemons in the background.

    process.daemon = True
    

    Any errors and hangs (or an infinite loop) in a daemon process will not affect the main process, and it will only be terminated once the main process exits.

    This will work for simple problems until you run into a lot of child daemon processes which will keep reaping memories from the parent process without any explicit control.

    Best way is to set up a Queue to have all the child processes communicate to the parent process so that we can join them and clean up nicely. Here is some simple code that will check if a child processing is hanging (aka time.sleep(1000)), and send a message to the queue for the main process to take action on it:

    import multiprocessing as mp
    import time
    import queue
    
    running_flag = mp.Value("i", 1)
    
    def worker(running_flag, q):
        count = 1
        while True:
            if running_flag.value:
                print "working {0} ...".format(count)
                count += 1
                q.put(count)
                time.sleep(1)
                if count > 3:
                    # Simulate hanging with sleep
                    print "hanging..."
                    time.sleep(1000)
    
    def watchdog(q):
        """
        This check the queue for updates and send a signal to it
        when the child process isn't sending anything for too long
        """
        while True:
            try:
                msg = q.get(timeout=10.0)
            except queue.Empty as e:
                print "[WATCHDOG]: Maybe WORKER is slacking"
                q.put("KILL WORKER")
    
    def main():
        """The main process"""
        q = mp.Queue()
    
        workr = mp.Process(target=worker, args=(running_flag, q))
        wdog = mp.Process(target=watchdog, args=(q,))
    
        # run the watchdog as daemon so it terminates with the main process
        wdog.daemon = True
    
        workr.start()
        print "[MAIN]: starting process P1"
        wdog.start()
    
        # Poll the queue
        while True:
            msg = q.get()
            if msg == "KILL WATCHDOG":
                print "[MAIN]: Terminating slacking WORKER"
                workr.terminate()
                time.sleep(0.1)
                if not workr.is_alive():
                    print "[MAIN]: WORKER is a goner"
                    workr.join(timeout=1.0)
                    print "[MAIN]: Joined WORKER successfully!"
                    q.close()
                    break # watchdog process daemon gets terminated
    
    if __name__ == '__main__':
        main()
    

    Without terminating worker, attempt to join() it to the main process would have blocked forever since worker has never finished.

    0 讨论(0)
  • 2020-12-14 03:22

    (Not having enough reputation points to comment, hereby a full answer)

    @PieOhPah: thank you for this very nice example.
    Unfortunately there is just one little flaw that doesn't let the watchdog kill the worker:

    if msg == "KILL WATCHDOG":
    

    it should be:

    if msg == "KILL WORKER":
    

    So the code becomes (with print updated for python3):

    import multiprocessing as mp
    import time
    import queue
    
    running_flag = mp.Value("i", 1)
    
    def worker(running_flag, q):
        count = 1
        while True:
            if running_flag.value:
                print ("working {0} ...".format(count))
                count += 1
                q.put(count)
                time.sleep(1)
                if count > 3:
                    # Simulate hanging with sleep
                    print ("hanging...")
                    time.sleep(1000)
    
    def watchdog(q):
        """
        This check the queue for updates and send a signal to it
        when the child process isn't sending anything for too long
        """
        while True:
            try:
                msg = q.get(timeout=10.0)
            except queue.Empty as e:
                print ("[WATCHDOG]: Maybe WORKER is slacking")
                q.put("KILL WORKER")
    
    def main():
        """The main process"""
        q = mp.Queue()
    
        workr = mp.Process(target=worker, args=(running_flag, q))
        wdog = mp.Process(target=watchdog, args=(q,))
    
        # run the watchdog as daemon so it terminates with the main process
        wdog.daemon = True
    
        workr.start()
        print ("[MAIN]: starting process P1")
        wdog.start()
    
        # Poll the queue
        while True:
            msg = q.get()
    #        if msg == "KILL WATCHDOG":
            if msg == "KILL WORKER":
                print ("[MAIN]: Terminating slacking WORKER")
                workr.terminate()
                time.sleep(0.1)
                if not workr.is_alive():
                    print ("[MAIN]: WORKER is a goner")
                    workr.join(timeout=1.0)
                    print ("[MAIN]: Joined WORKER successfully!")
                    q.close()
                    break # watchdog process daemon gets terminated
    
    if __name__ == '__main__':
        main()
    
    0 讨论(0)
  • 2020-12-14 03:23

    The way Python multiprocessing handles processes is a bit confusing.

    From the multiprocessing guidelines:

    Joining zombie processes

    On Unix when a process finishes but has not been joined it becomes a zombie. There should never be very many because each time a new process starts (or active_children() is called) all completed processes which have not yet been joined will be joined. Also calling a finished process’s Process.is_alive will join the process. Even so it is probably good practice to explicitly join all the processes that you start.

    In order to avoid a process to become a zombie, you need to call it's join() method once you kill it.

    If you want a simpler way to deal with the hanging calls in your system you can take a look at pebble.

    0 讨论(0)
提交回复
热议问题