Shared-memory objects in multiprocessing

后端 未结 4 1654
再見小時候
再見小時候 2020-11-22 17:04

Suppose I have a large in memory numpy array, I have a function func that takes in this giant array as input (together with some other parameters). func

4条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-11-22 17:33

    Like Robert Nishihara mentioned, Apache Arrow makes this easy, specifically with the Plasma in-memory object store, which is what Ray is built on.

    I made brain-plasma specifically for this reason - fast loading and reloading of big objects in a Flask app. It's a shared-memory object namespace for Apache Arrow-serializable objects, including pickle'd bytestrings generated by pickle.dumps(...).

    The key difference with Apache Ray and Plasma is that it keeps track of object IDs for you. Any processes or threads or programs that are running on locally can share the variables' values by calling the name from any Brain object.

    $ pip install brain-plasma
    
    $ plasma_store -m 10000000 -s /tmp/plasma
    
    from brain_plasma import Brain
    brain = Brain(path='/tmp/plasma/)
    
    brain['a'] = [1]*10000
    
    brain['a']
    # >>> [1,1,1,1,...]
    

提交回复
热议问题