问题
To add a field to a structured numpy array, it is quite simply to create a new array with a new dtype, copy over the old fields, and add the new field. However, I need to do this for an array that takes a lot of memory, and I would rather not duplicate all of it. Both my own implementation and the (slow) implementation in numpy.lib.recfunctions.append_fields
duplicate memory.
Is there a way to add a field to a structured ndarray
, without duplicating memory? That means, either a way that avoids creating a new ndarray
, or a way to create a new ndarray
that points to the same data as the old?
Solutions that do duplicate RAM:
Adding a field to a structured numpy array
Adding a field to a structured numpy array (2)
Adding a field to a structured numpy array (3)
There is a similar question where the challenge is to remove, not add, fields. The solution uses a view, which should work for a subset of the original data, but I'm not sure if it can be amended when I rather want to add fields.
回答1:
A structured array is stored, like a regular one, as a contiguous buffer of bytes, one record following the previous. The records are, thus, a bit like the last dimension of a multidimensional array. You can't add a column to a 2d array without making a new array via concatenation.
Adding a field, say I4
dtype to dtype that is, say, 20 bytes long, means changing the record (element) length to 24, i.e. adding 4 bytes to the buffer every 20th byte. numpy
can't do that without making a new data buffer and copying values from the old (and the new).
Actually even if we were talking about adding a new record to the array, i.e. concatenating on a new array, it would still require creating a new data buffer. Arrays are fixed sized.
Fields in a structured array are not like objects in a list or a dictionary. You can't add a field by just adding a pointer to an object elsewhere in memory.
Maybe you should be using a dictionary, with item
being an array. Then you can freely add a key/item without copying the existing ones. But then access by 'rows' will be slow.
来源:https://stackoverflow.com/questions/39965994/memory-friendly-way-to-add-a-field-to-a-structured-ndarray-without-duplicating