Python multiprocessing apply_async “assert left > 0” AssertionError

自作多情 提交于 2019-11-30 15:48:33

There is a bug in Python C core code that prevents data responses bigger than 2GB return correctly to the main thread. you need to either split the data into smaller chunks as suggested in the previous answer or not use multiprocessing for this function

I reported this bug to python bugs list (https://bugs.python.org/issue34563) and created a PR (https://github.com/python/cpython/pull/9027) to fix it, but it probably will take a while to get it released (UPDATE: the fix is present in python 3.7.3+)

if you are interested you can find more details on what causes the bug in the bug description in the link I posted

It think I've found a workaround by retrieving data in small chunks. In my case it was a list of lists.

I had:

for i in range(0, NUMBER_OF_THREADS):
    print('MAIN: Getting data from process ' + str(i) + ' proxy...')
    X_train.extend(ListasX[i]._getvalue())
    Y_train.extend(ListasY[i]._getvalue())
    ListasX[i] = None
    ListasY[i] = None
    gc.collect()

Changed to:

CHUNK_SIZE = 1024
for i in range(0, NUMBER_OF_THREADS):
    print('MAIN: Getting data from process ' + str(i) + ' proxy...')
    for k in range(0, len(ListasX[i]), CHUNK_SIZE):
        X_train.extend(ListasX[i][k:k+CHUNK_SIZE])
        Y_train.extend(ListasY[i][k:k+CHUNK_SIZE])
    ListasX[i] = None
    ListasY[i] = None
    gc.collect()

And now it seems to work, possibly by serializing less data at a time. So maybe if you can segment your data into smaller portions you can overcome the issue. Good luck!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!