Memory-friendly way to add a field to a structured ndarray — without duplicating data?

问题

To add a field to a structured numpy array, it is quite simply to create a new array with a new dtype, copy over the old fields, and add the new field. However, I need to do this for an array that takes a lot of memory, and I would rather not duplicate all of it. Both my own implementation and the (slow) implementation in numpy.lib.recfunctions.append_fields duplicate memory.

Is there a way to add a field to a structured ndarray, without duplicating memory? That means, either a way that avoids creating a new ndarray, or a way to create a new ndarray that points to the same data as the old?

Solutions that do duplicate RAM:

Adding a field to a structured numpy array
Adding a field to a structured numpy array (2)
Adding a field to a structured numpy array (3)

There is a similar question where the challenge is to remove, not add, fields. The solution uses a view, which should work for a subset of the original data, but I'm not sure if it can be amended when I rather want to add fields.

回答1:

A structured array is stored, like a regular one, as a contiguous buffer of bytes, one record following the previous. The records are, thus, a bit like the last dimension of a multidimensional array. You can't add a column to a 2d array without making a new array via concatenation.

Adding a field, say I4 dtype to dtype that is, say, 20 bytes long, means changing the record (element) length to 24, i.e. adding 4 bytes to the buffer every 20th byte. numpy can't do that without making a new data buffer and copying values from the old (and the new).

Actually even if we were talking about adding a new record to the array, i.e. concatenating on a new array, it would still require creating a new data buffer. Arrays are fixed sized.

Fields in a structured array are not like objects in a list or a dictionary. You can't add a field by just adding a pointer to an object elsewhere in memory.

Maybe you should be using a dictionary, with item being an array. Then you can freely add a key/item without copying the existing ones. But then access by 'rows' will be slow.

来源：https://stackoverflow.com/questions/39965994/memory-friendly-way-to-add-a-field-to-a-structured-ndarray-without-duplicating

标签

python

arrays

numpy

memory

structured-array