multiprocessing.RawArray operation

半世苍凉 提交于 2019-12-11 08:48:20

问题


I read that RawArray can be shared between proceses without being copied, and wanted to understand how it is possible in Python.

I saw in sharedctypes.py, that a RawArray is constructed from a BufferWrapper from heap.py, then nullified with ctypes.memset.

BufferWrapper is made of an Arena object, which itself is built from an mmap (or 100 mmaps in windows, see line 40 in heap.py)

I read that the mmap system call is actually used to allocate memory in Linux/BSD, and the Python module uses MapViewOfFile for windows.

mmap seems handy then. It seems to be able to work directly with mp.pool-

from struct import pack
from mmap import mmap

def pack_into_mmap(idx_nums_tup):

    idx, ints_to_pack = idx_nums_tup
    pack_into(str(len(ints_to_pack)) + 'i', shared_mmap, idx*4*total//2 , *ints_to_pack)


if __name__ == '__main__':

    total = 5 * 10**7
    shared_mmap = mmap(-1, total * 4)
    ints_to_pack = range(total)

    pool = Pool()
    pool.map(pack_into_mmap, enumerate((ints_to_pack[:total//2], ints_to_pack[total//2:])))

My question is -

How does the multirocessing module know not to copy the mmap based RawArray object between processes, like it does with "regular" python objects?


回答1:


[Python 3.Docs]: multiprocessing - Process-based parallelism serializes / deserializes data exchanged between processes using a proprietary protocol: [Python 3.Docs]: pickle - Python object serialization (and from here the terms: pickle / unpickle).

According to [Python 3.Docs]: pickle - object.__getstate__():

Classes can further influence how their instances are pickled; if the class defines the method __getstate__(), it is called and the returned object is pickled as the contents for the instance, instead of the contents of the instance’s dictionary. If the __getstate__() method is absent, the instance’s __dict__ is pickled as usual.

As seen in (Win variant of) Arena.__getstate__, (class chain: sharedctypes.RawArray -> heap.BufferWrapper - > heap.Heap -> heap.Arena), only the metadata (name and size) are pickled for the Arena instance, but not the buffer itself.

Conversely, in __setstate__, the buffer is constructed based on the (above) metadata.



来源:https://stackoverflow.com/questions/56495471/multiprocessing-rawarray-operation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!