Where is the memory leak? How to timeout threads during multiprocessing in python?

*爱你&永不变心* 提交于 2019-12-06 19:46:19

问题


It is unclear how to properly timeout workers of joblib's Parallel in python. Others have had similar questions here, here, here and here.

In my example I am utilizing a pool of 50 joblib workers with threading backend.

Parallel Call (threading):

output = Parallel(n_jobs=50, backend  = 'threading')
    (delayed(get_output)(INPUT) 
        for INPUT in list)

Here, Parallel hangs without errors as soon as len(list) <= n_jobs but only when n_jobs => -1.

In order to circumvent this issue, people give instructions on how to create a timeout decorator to the Parallel function (get_output(INPUT)) in the above example) using multiprocessing:

Main function (decorated):

@with_timeout(10)    # multiprocessing
def get_output(INPUT):     # threading
    output = do_stuff(INPUT)
    return output

Multiprocessing Decorator:

def with_timeout(timeout):
    def decorator(decorated):
        @functools.wraps(decorated)
        def inner(*args, **kwargs):
            pool = multiprocessing.pool.ThreadPool(1)
            async_result = pool.apply_async(decorated, args, kwargs)
            try:
                return async_result.get(timeout)
            except multiprocessing.TimeoutError:
                return
        return inner
    return decorator

Adding the decorator to the otherwise working code results in a memory leak after ~2x the length of the timeout plus a crash of eclipse.

Where is this leak in the decorator?

How to timeout threads during multiprocessing in python?


回答1:


It is not possible to kill a Thread in Python without a hack.

The memory leak you are experiencing is due to the accumulation of threads you believe they have been killed. To prove that, just try to inspect the amount of threads your application is running, you will see them slowly growing.

Under the hood, the thread of the ThreadPool is not terminated but keeps running your function until the end.

The reason why a Thread cannot be killed, is due to the fact that threads share memory with the parent process. Therefore, it is very hard to kill a thread while ensuring the memory integrity of your application.

Java developers figured it out long ago.

If you can run your function in a separate process, then you could easily rely on a timeout logic where the process itself is killed once the timeout is reached.

The Pebble library already offers decorators with timeout.



来源:https://stackoverflow.com/questions/48540668/where-is-the-memory-leak-how-to-timeout-threads-during-multiprocessing-in-pytho

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!