How to efficiently extract values from nested numpy arrays generated by loadmat function?

让人想犯罪 __ 提交于 2021-02-07 19:12:11

问题


Is there a more efficient method in python to extract data from a nested python list such as A = array([[array([[12000000]])]], dtype=object). I have been using A[0][0][0][0], it does not seem to be an efficinet method when you have lots of data like A.

I have also used numpy.squeeeze(array([[array([[12000000]])]], dtype=object)) but this gives me

array(array([[12000000]]), dtype=object)

PS: The nested array was generated by loadmat() function in scipy module to load a .mat file which consists of nested structures.


回答1:


Creating such an array is a bit tedious, but loadmat does it to handle the MATLAB cells and 2d matrix:

In [5]: A = np.empty((1,1),object)
In [6]: A[0,0] = np.array([[1.23]])
In [7]: A
Out[7]: array([[array([[ 1.23]])]], dtype=object)
In [8]: A.any()
Out[8]: array([[ 1.23]])
In [9]: A.shape
Out[9]: (1, 1)

squeeze compresses the shape, but does not cross the object boundary

In [10]: np.squeeze(A)
Out[10]: array(array([[ 1.23]]), dtype=object)

but if you have one item in an array (regardless of shape) item() can extract it. Indexing also works, A[0,0]

In [11]: np.squeeze(A).item()
Out[11]: array([[ 1.23]])

item again to extract the number from that inner array:

In [12]: np.squeeze(A).item().item()
Out[12]: 1.23

Or we don't even need the squeeze:

In [13]: A.item().item()
Out[13]: 1.23

loadmat has a squeeze_me parameter.

Indexing is just as easy:

In [17]: A[0,0]
Out[17]: array([[ 1.23]])
In [18]: A[0,0][0,0]
Out[18]: 1.23

astype can also work (though it can be picky about the number of dimensions).

In [21]: A.astype(float)
Out[21]: array([[ 1.23]])

With single item arrays like efficiency isn't much of an issue. All these methods are quick. Things become more complicated when the array has many items, or the items are themselves large.

How to access elements of numpy ndarray?




回答2:


You could use A.all() or A.any() to get a scalar. This would only work if A contains one element.




回答3:


Try A.flatten()[0]

This will flatten the array into a single dimension and extract the first item from it. In your case, the first item is the only item.




回答4:


What worked in my case was the following..

import scipy.io

xcat = scipy.io.loadmat(os.path.join(dir_data, file_name))
pars = xcat['pars']  # Extract numpy.void element from the loadmat object

# Note that you are dealing with a numpy structured array object when you enter pars[0][0]. 
# Thus you can acces names and all that...
dict_values = [x[0][0] for x in pars[0][0]]  # Extract all elements in one go
dict_keys = list(pars.dtype.names)  # Extract the corresponding names/tags
dict_xcat = dict(zip(dict_keys, dict_values))  # Pack it up again in a dict

where the idea behind this is.. first extract ALL values I want, and format them in a nice python dict. This prevents me from cumbersome indexing later in the file...

Of course, this is a very specific solution. Since in my case the values I needed were all floats/ints.



来源:https://stackoverflow.com/questions/48233313/how-to-efficiently-extract-values-from-nested-numpy-arrays-generated-by-loadmat

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!