Python join a process without blocking parent

后端 未结 5 1438
孤独总比滥情好
孤独总比滥情好 2020-12-08 16:23

I\'m writing a program that will watch a particular directory for new files containing download URLs. Once a new file is detected, it will create a new process to do the act

5条回答
  •  余生分开走
    2020-12-08 16:35

    Instead of trying to shoehorn multiprocessing.Process() into working for you, perhaps you should use a different tool, like apply_async() with a multiprocessing.Pool():

    def main(argv):
        # parse command line args
        ...
        # set up variables
        ...
    
        # set up multiprocessing Pool
        pool = multiprocessing.Pool()
    
        try:
            watch_dir(watch_dir, download_dir, pool)
    
        # catch whatever kind of exception you expect to end your infinite loop
        # you can omit this try/except if you really think your script will 
        # run "forever" and you're okay with zombies should it crash
        except KeyboardInterrupt:
            pool.close()
            pool.join()
    
    def watch_dir(wDir, dDir, pool):
        # Grab the current watch directory listing
        before = dict([(f, None) for f in os.listdir (wDir)])
    
        # Loop FOREVER
        while 1:
            # sleep for 10 secs
            time.sleep(10)
    
            # Grab the current dir listing
            after = dict([(f, None) for f in os.listdir (wDir)])
    
            # Get the list of new files
            added = [f for f in after if not f in before]
            # Get the list of deleted files
            removed = [f for f in before if not f in after]
    
            if added:
                # We have new files, do your stuff
                print "Added: ", ", ".join(added)
    
                # launch the function in a subprocess - this is NON-BLOCKING
                pool.apply_async(child, (added, wDir, dDir))
    
            if removed:
                # tell the user the file was deleted
                print "Removed: ", ", ".join(removed)
    
            # Set before to the current
            before = after
    
    def child(filename, wDir, dDir):
        # Open filename and extract the url
        ...
        # Download the file and to the dDir directory
        ...
        # Delete filename from the watch directory
        ...
        # simply return to "exit cleanly"
        return
    

    The multiprocessing.Pool() is a pool of worker subprocesses that you can submit "jobs" to. The pool.apply_async() function call causes one of the subprocesses to run your function with the arguments provided, asynchronously, and doesn't need joined until your script is done with all of its work and closes the whole pool. The library manages the details for you.

    I think this will serve you better than the current accepted answer for the following reasons:
    1. It removes the unnecessary complexity of launching extra threads and queues just to manage subprocesses.
    2. It uses library routines that are made specifically for this purpose, so you get the benefit of future library improvements.
    3. IMHO, it is much more maintainable.
    4. It is a more flexible. If you one day decide that you want to actually see a return value from your subprocesses, you can store the return value from the apply_async() call (a result object) and check it whenever you want. You could store a bunch of them in a list and process them as a batch when your list gets above a certain size. You can move the creation of the pool into the watch_dir() function and do away with the try/except if you don't really care what happens if the "infinite" loop is interrupted. If you put some kind of break condition in the (presently) infinite loop, you can simply add pool.close() and pool.join() after the loop and everything is cleaned up.

提交回复
热议问题