Transfer ownership of numpy data

问题

In my previous question, I learned to resize a subclassed ndarray in place. Neat. Unfortunately, that no longer works when the array that I am trying to resize is the result of a computation:

import numpy as np

class Foo(np.ndarray):
    def __new__(cls,shape,dtype=np.float32,buffer=None,offset=0,
                strides=None,order=None):
        return np.ndarray.__new__(cls,shape,dtype,buffer,offset,strides,order)

    def __array_prepare__(self,output,context):
        print output.flags['OWNDATA'],"PREPARE",type(output)
        return np.ndarray.__array_prepare__(self,output,context)

    def __array_wrap__(self,output,context=None):
        print output.flags['OWNDATA'],"WRAP",type(output)

        return np.ndarray.__array_wrap__(self,output,context)

a = Foo((32,))
#resizing a is no problem
a.resize((24,),refcheck=False)

b = Foo((32,))
c = Foo((32,))

d = b+c
#Cannot resize `d`
d.resize((24,),refcheck=False)

The exact output (including traceback) is:

True PREPARE <type 'numpy.ndarray'>
False WRAP <class '__main__.Foo'>
Traceback (most recent call last):
  File "test.py", line 26, in <module>
    d.resize((24,),refcheck=False)
ValueError: cannot resize this array: it does not own its data

I think this is because numpy creates a new ndarray and passes it to __array_prepare__. At some point along the way though, it seems that the "output" array gets view-casted to my Foo type, although the docs don't seem to be 100% clear/accurate on this point. In any event, after the view casting, the output no longer owns the data making it impossible to reshape in place (as far as I can tell).

Is there any way, via some sort of numpy voodoo (__array_prepare__, __array__) etc. to transfer ownership of the data to the instance of my subclass?

回答1:

It is hardly a satisfactory answer, but it doesn't fit into a comment either... You can work around the owning of the data by using the ufunc's out parameter. A silly example:

>>> a = Foo((5,))
>>> b = Foo((5,))
>>> c = a + b # BAD
True PREPARE <type 'numpy.ndarray'>
False WRAP <class '__main__.Foo'>
>>> c.flags.owndata
False

>>> c = Foo((5,))
>>> c[:] = a + b # BETTER
True PREPARE <type 'numpy.ndarray'>
False WRAP <class '__main__.Foo'>
>>> c.flags.owndata
True

>>> np.add(a, b, out=c) # BEST
True PREPARE <class '__main__.Foo'>
True WRAP <class '__main__.Foo'>
Foo([  1.37754085e-38,   1.68450356e-20,   6.91042737e-37,
         1.74735556e-04,   1.48018885e+29], dtype=float32)
>>> c.flags.owndata
True

I think that the output above is consistent with c[:] = a + b getting to own the data at the expense of copying it into c from a temporary array. But that shouldn't be happening when you use the out parameter.

Since you were already worried about intermediate storage in your mathematical expressions, it may not be such a bad thing to micro-manage how it is handled. That is, replacing

g = a + b + np.sqrt(d*d + e*e + f*f)

with

g = foo_like(d) # you'll need to write this function!
np.multiply(d, d, out=g)
g += e * e
g += f * f
np.sqrt(g, out=g)
g += b
g += a

may save you some intermediate memory, and it lets you own your data. It does throw the "readability counts" mantra out the window, but...

回答2:

At some point along the way though, it seems that the "output" array gets view-casted to my Foo type

Yes, ndarray.__array_prepare__ calls output.view, which returns an array which does not own its data.

I experimented a bit and couldn't find an easy way around that.

While I agree this behavior is not ideal, at least in your use case, I would claim it is acceptable for d to not own its data. Numpy uses views extensively and if you insist on avoiding creating any views in your working with numpy arrays, you're making your life very hard.

I would also claim that, based on my experience, resize should generally be avoided. You should not have any problem working with the view created if you avoid resizeing. There's a hacky feeling to it, and it's hard to work with (as you might begin to understand, having encountered one of the two classic errors when using it: it does not own its data. The other is cannot resize an array that has been referenced). (Another problem is described in this quesion.)

Since your decision to use resize comes from an answer to your other question, I'll post the rest of my answer there.

回答3:

How about:

def resize(arr, shape):
    np.require(arr, requirements=['OWNDATA'])
    arr.resize(shape, refcheck=False)

It seems to succeed at resizing (and reducing memory consumption):

import array
import numpy as np
import time

class Foo(np.ndarray):
    def __new__(cls, shape, dtype=np.float32, buffer=None, offset=0,
                strides=None, order=None):
        return np.ndarray.__new__(cls, shape, dtype, buffer, offset, strides, order)

    def __array_prepare__(self, output, context):
        print(output.flags['OWNDATA'], "PREPARE", type(output))
        return np.ndarray.__array_prepare__(self, output, context)

    def __array_wrap__(self, output, context=None):
        print(output.flags['OWNDATA'], "WRAP", type(output))
        output = np.ndarray.__array_wrap__(self, output, context)
        return output

def free_memory():
    """
    Return free memory available, including buffer and cached memory
    """
    total = 0
    with open('/proc/meminfo', 'r') as f:
        for line in f:
            line = line.strip()
            if any(line.startswith(field) for field in ('MemFree', 'Buffers', 'Cached')):
                field, amount, unit = line.split()
                amount = int(amount)
                if unit != 'kB':
                    raise ValueError(
                        'Unknown unit {u!r} in /proc/meminfo'.format(u=unit))
                total += amount
    return total


def gen_change_in_memory():
    """
    http://stackoverflow.com/a/14446011/190597 (unutbu)
    """
    f = free_memory()
    diff = 0
    while True:
        yield diff
        f2 = free_memory()
        diff = f - f2
        f = f2
change_in_memory = gen_change_in_memory().next

def resize(arr, shape):
    print(change_in_memory())
    # 0
    np.require(arr, requirements=['OWNDATA'])

    time.sleep(1)
    print(change_in_memory())
    # 200

    arr.resize(shape, refcheck=False)

N = 10000000
b = Foo((N,), buffer = array.array('f',range(N)))
c = Foo((N,), buffer = array.array('f',range(N)))

yields

print(change_in_memory())
# 0

d = b+c
d = np.require(d, requirements=['OWNDATA'])

print(change_in_memory())
# 39136

resize(d, (24,))   # Increases memory by 200 KiB
time.sleep(1)
print(change_in_memory())
# -39116

来源：https://stackoverflow.com/questions/15424211/transfer-ownership-of-numpy-data

标签

python

numpy