Asynchronous multiprocessing with a worker pool in Python: how to keep going after timeout?

前端 未结 3 1477
悲哀的现实
悲哀的现实 2020-12-10 15:23

I would like to run a number of jobs using a pool of processes and apply a given timeout after which a job should be killed and replaced by another working on the next task.

3条回答
  •  北海茫月
    2020-12-10 15:59

    Currently the Python does not provide native means to the control execution time of each distinct task in the pool outside the worker itself.
    So the easy way is to use wait_procs in the psutil module and implement the tasks as subprocesses.
    If nonstandard libraries are not desirable, then you have to implement own Pool on base of subprocess module having the working cycle in the main process, poll() - ing the execution of each worker and performing required actions.

    As for the updated problem, the pool becomes corrupted if you directly terminate one of the workers (it is the bug in the interpreter implementation, because such behavior should not be allowed): the worker is recreated, but the task is lost and the pool becomes nonjoinable. You have to terminate all the pool and then recreate it again for another tasks:

    from multiprocessing import Pool
    while True:
        pool = Pool(processes=4)
        jobs = pool.map_async(Check, range(10))
        print "Waiting for result"
        try:
            result = jobs.get(timeout=1)
            break # all clear
        except multiprocessing.TimeoutError: 
            # kill all processes
            pool.terminate()
            pool.join()
    print result    
    

    UPDATE

    Pebble is an excellent and handy library, which solves the issue. Pebble is designed for the asynchronous execution of Python functions, where is PyExPool is designed for the asynchronous execution of modules and external executables, though both can be used interchangeably.

    One more aspect is when 3dparty dependencies are not desirable, then PyExPool can be a good choice, which is a single-file lightweight implementation of Multi-process Execution Pool with per-Job and global timeouts, opportunity to group Jobs into Tasks and other features.
    PyExPool can be embedded into your sources and customized, having permissive Apache 2.0 license and production quality, being used in the core of one high-loaded scientific benchmarking framework.

提交回复
热议问题