Multiprocessing in Python while limiting the number of running processes

前端 未结 4 1741
半阙折子戏
半阙折子戏 2021-02-02 12:46

I\'d like to run multiple instances of program.py simultaneously, while limiting the number of instances running at the same time (e.g. to the number of CPU cores available on m

4条回答
  •  轮回少年
    2021-02-02 13:25

    I know you mentioned that the Pool.map approach doesn't make much sense to you. The map is just an easy way to give it a source of work, and a callable to apply to each of the items. The func for the map could be any entry point to do the actual work on the given arg.

    If that doesn't seem right for you, I have a pretty detailed answer over here about using a Producer-Consumer pattern: https://stackoverflow.com/a/11196615/496445

    Essentially, you create a Queue, and start N number of workers. Then you either feed the queue from the main thread, or create a Producer process that feeds the queue. The workers just keep taking work from the queue and there will never be more concurrent work happening than the number of processes you have started.

    You also have the option of putting a limit on the queue, so that it blocks the producer when there is already too much outstanding work, if you need to put constraints also on the speed and resources that the producer consumes.

    The work function that gets called can do anything you want. This can be a wrapper around some system command, or it can import your python lib and run the main routine. There are specific process management systems out there which let you set up configs to run your arbitrary executables under limited resources, but this is just a basic python approach to doing it.

    Snippets from that other answer of mine:

    Basic Pool:

    from multiprocessing import Pool
    
    def do_work(val):
        # could instantiate some other library class,
        # call out to the file system,
        # or do something simple right here.
        return "FOO: %s" % val
    
    pool = Pool(4)
    work = get_work_args()
    results = pool.map(do_work, work)
    

    Using a process manager and producer

    from multiprocessing import Process, Manager
    import time
    import itertools
    
    def do_work(in_queue, out_list):
        while True:
            item = in_queue.get()
    
            # exit signal 
            if item == None:
                return
    
            # fake work
            time.sleep(.5)
            result = item
    
            out_list.append(result)
    
    
    if __name__ == "__main__":
        num_workers = 4
    
        manager = Manager()
        results = manager.list()
        work = manager.Queue(num_workers)
    
        # start for workers    
        pool = []
        for i in xrange(num_workers):
            p = Process(target=do_work, args=(work, results))
            p.start()
            pool.append(p)
    
        # produce data
        # this could also be started in a producer process
        # instead of blocking
        iters = itertools.chain(get_work_args(), (None,)*num_workers)
        for item in iters:
            work.put(item)
    
        for p in pool:
            p.join()
    
        print results
    

提交回复
热议问题