NumPy Array Copy-On-Write

我怕爱的太早我们不能终老 提交于 2021-02-07 08:02:16

问题


I have a class that returns large NumPy arrays. These arrays are cached within the class. I would like the returned arrays to be copy-on-write arrays. If the caller ends up just reading from the array, no copy is ever made. This will case no extra memory will be used. However, the array is "modifiable", but does not modify the internal cached arrays.

My solution at the moment is to make any cached arrays readonly (a.flags.writeable = False). This means that if the caller of the function may have to make their own copy of the array if they want to modify it. Of course, if the source was not from cache and the array was already writable, then they would duplicate the data unnecessarily.

So, optimally I would love something like a.view(flag=copy_on_write). There seems to be a flag for the reverse of this UPDATEIFCOPY which causes a copy to update the original once deallocated.

Thanks!


回答1:


Copy-on-write is a nice concept, but explicit copying seems to be "the NumPy philosophy". So personally I would keep the "readonly" solution if it isn't too clumsy.

But I admit having written my own copy-on-write wrapper class. I don't try to detect write access to the array. Instead the class has a method "get_array(readonly)" returning its (otherwise private) numpy array. The first time you call it with "readonly=False" it makes a copy. This is very explicit, easy to read and quickly understood.

If your copy-on-write numpy array looks like a classical numpy array, the reader of your code (possibly you in 2 years) may have a hard time.




回答2:


To implement copy on write, we need to modify base, data, strides of ndarray object. I think this can't be done in pure Python code. I use some Cython code to modify these attributes.

Here is the code in IPython notebook:

%load_ext cythonmagic

use Cython define copy_view():

%%cython
cimport numpy as np

np.import_array()
np.import_ufunc()

def copy_view(np.ndarray a):
    cdef np.ndarray b
    cdef object base
    cdef int i
    base = np.get_array_base(a)
    if base is None or isinstance(base, a.__class__):
        return a
    else:
        print "copy"
        b = a.copy()
        np.set_array_base(a, b)
        a.data = b.data
        for i in range(b.ndim):
            a.strides[i] = b.strides[i]

define a subclass of ndarray:

class cowarray(np.ndarray):
    def __setitem__(self, key, value):
        copy_view(self)
        np.ndarray.__setitem__(self, key, value)

    def __array_prepare__(self, array, context=None):
        if self is array:
            copy_view(self)
        return array

    def __array__(self):
        copy_view(self)
        return self

some test:

a = np.array([1.0, 2, 3, 4])
b = a.view(cowarray)
b[1] = 100 #copy 
print a, b
b[2] = 200 #no copy
print a, b

c = a[::2].view(cowarray)
c[0] = 1000 #copy
print a, c

d = a.view(cowarray)
np.sin(d, d) #copy
print a, d           

the output:

copy
[ 1.  2.  3.  4.] [   1.  100.    3.    4.]
[ 1.  2.  3.  4.] [   1.  100.  200.    4.]
copy
[ 1.  2.  3.  4.] [ 1000.     3.]
copy
[ 1.  2.  3.  4.] [ 0.84147098  0.90929743  0.14112001 -0.7568025 ]


来源:https://stackoverflow.com/questions/21896030/numpy-array-copy-on-write

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!