问题
I have a numpy structured array of the following form:
x = np.array([(1,2,3)]*2, [('t', np.int16), ('x', np.int8), ('y', np.int8)])
I now want to generate views into this array that team up 't'
with either 'x'
or 'y'
. The usual syntax creates a copy:
v_copy = x[['t', 'y']]
v_copy
#array([(1, 3), (1, 3)],
# dtype=[('t', '<i2'), ('y', '|i1')])
v_copy.base is None
#True
This is not unexpected, since picking two fields is "fancy indexing", at which point numpy gives up and makes a copy. Since my actual records are large, I want to avoid the copy at all costs.
It is not at all true that the required elements cannot be accessed within numpy's strided memory model. Looking at the individual bytes in memory:
x.view(np.int8)
#array([1, 0, 2, 3, 1, 0, 2, 3], dtype=int8)
one can figure out the necessary strides:
v = np.recarray((2,2), [('b', np.int8)], buf=x, strides=(4,3))
v
#rec.array([[(1,), (3,)],
# [(1,), (3,)]],
# dtype=[('b', '|i1')])
v.base is x
#True
Clearly, v
points to the correct locations in memory without having created a copy. Unfortunately, numpy won't allow me to reinterpret these memory locations as the original data types:
v_view = v.view([('t', np.int16), ('y', np.int8)])
#ValueError: new type not compatible with array.
Is there a way to trick numpy into doing this cast, so that an array v_view
equivalent to v_copy
is created, but without having made a copy? Perhaps working directly on v.__array_interface__
, as is done in np.lib.stride_tricks.as_strided()
?
回答1:
You can construct a suitable dtype like so
dt2 = np.dtype(dict(names=('t', 'x'), formats=(np.int16, np.int8), offsets=(0, 2)))
and then do
y = np.recarray(x.shape, buf=x, strides=x.strides, dtype=dt2)
In future Numpy versions (> 1.6), you can also do
dt2 = np.dtype(dict(names=('t', 'x'), formats=(np.int16, np.int8), offsets=(0, 2), itemsize=4))
y = x.view(dt2)
回答2:
This works with numpy 1.6.x and avoids creating a recarray
:
dt2 = {'t': (np.int16, 0), 'y': (np.int8, 3)}
v_view = np.ndarray(x.shape, dtype=dt2, buffer=x, strides=x.strides)
v_view
#array([(1, 3), (1, 3)],
# dtype=[('t', '<i2'), ('', '|V1'), ('y', '|i1')])
v_view.base is x
#True
One can wrap this in a class overloading np.ndarray
:
class arrayview(np.ndarray):
def __new__(subtype, x, fields):
dtype = {f: x.dtype.fields[f] for f in fields}
return np.ndarray.__new__(subtype, x.shape, dtype,
buffer=x, strides=x.strides)
v_view = arrayview(x, ('t', 'y'))
v_view
#arrayview([(1, 3), (1, 3)],
# dtype=[('t', '<i2'), ('', '|V1'), ('y', '|i1')])
v_view.base is x
#True
来源:https://stackoverflow.com/questions/11774168/python-numpy-recarray-can-one-obtain-a-view-into-different-fields-using-pointer