Matrices with different row lengths in numpy

问题

Is there a way of defining a matrix (say m) in numpy with rows of different lengths, but such that m stays 2-dimensional (i.e. m.ndim = 2)?

For example, if you define m = numpy.array([[1,2,3], [4,5]]), then m.ndim = 1. I understand why this happens, but I'm interested if there is any way to trick numpy into viewing m as 2D. One idea would be padding with a dummy value so that rows become equally sized, but I have lots of such matrices and it would take up too much space. The reason why I really need m to be 2D is that I am working with Theano, and the tensor which will be given the value of m expects a 2D value.

回答1:

No, this is not possible. NumPy arrays need to be rectangular in every pair of dimensions. This is due to the way they map onto memory buffers, as a pointer, itemsize, stride triple.

As for this taking up space: np.array([[1,2,3], [4,5]]) actually takes up more space than a 2×3 array, because it's an array of two pointers to Python lists (and even if the elements were converted to arrays, the memory layout would still be inefficient).

回答2:

I'll give here very new information about Theano. We have a new TypedList() type, that allow to have python list with all elements with the same type: like 1d ndarray. All is done, except the documentation.

There is limited functionality you can do with them. But we did it to allow looping over the typed list with scan. It is not yet integrated with scan, but you can use it now like this:

import theano
import theano.typed_list

a = theano.typed_list.TypedListType(theano.tensor.fvector)()
s, _ = theano.scan(fn=lambda i, tl: tl[i].sum(),
                   non_sequences=[a],
                   sequences=[theano.tensor.arange(2, dtype='int64')])

f = theano.function([a], s)
f([[1, 2, 3], [4, 5]])

One limitation is that the output of scan must be an ndarray, not a typed list.

来源：https://stackoverflow.com/questions/24203451/matrices-with-different-row-lengths-in-numpy

标签

numpy

matrix

theano