Multiprocessing in Python while limiting the number of running processes

前端 未结 4 1731
半阙折子戏
半阙折子戏 2021-02-02 12:46

I\'d like to run multiple instances of program.py simultaneously, while limiting the number of instances running at the same time (e.g. to the number of CPU cores available on m

4条回答
  •  無奈伤痛
    2021-02-02 13:12

    While there are many answers about using multiprocessing.pool, there are not many code snippets on how to use multiprocessing.Process, which is indeed more beneficial when memory usage matters. starting 1000 processes will overload the CPU and kill the memory. If each process and its data pipelines are memory intensive, OS or Python itself will limit the number of parallel processes. I developed the below code to limit the simultaneous number of jobs submitted to the CPU in batches. The batch size can be scaled proportional to the number of CPU cores. In my windows PC, the number of jobs per batch can be efficient upto 4 times the CPU coures available.

    import multiprocessing
    def func_to_be_multiprocessed(q,data):
        q.put(('s'))
    q = multiprocessing.Queue()
    worker = []
    for p in range(number_of_jobs):
        worker[p].append(multiprocessing.Process(target=func_to_be_multiprocessed, \
            args=(q,data)...))
    num_cores = multiprocessing.cpu_count()
    Scaling_factor_batch_jobs = 3.0
    num_jobs_per_batch = num_cores * Scaling_factor_batch_jobs
    num_of_batches = number_of_jobs // num_jobs_per_batch
    for i_batch in range(num_of_batches):
        floor_job = i_batch * num_jobs_per_batch
        ceil_job  = floor_job + num_jobs_per_batch
        for p in worker[floor_job : ceil_job]:
                                             worker.start()
        for p in worker[floor_job : ceil_job]:
                                             worker.join()
    for p in worker[ceil_job :]:
                               worker.start()
    for p in worker[ceil_job :]:
                               worker.join()
    for p in multiprocessing.active_children():
                               p.terminate()
    result = []
    for p in worker:
       result.append(q.get())
    

    The only problem is, if any of the job in any batch could not complete and leads to a hanging situation, rest of the batches of jobs will not be initiated. So, the function to be processed must have proper error handling routines.

提交回复
热议问题