numpy structured array sorting by multiple columns

时间秒杀一切 提交于 2021-01-29 15:20:14

问题


A minimal numpy structured array generator:

import numpy as np

index = np.arange(4)
A = np.stack((np.sin(index), np.cos(index)),axis=1)
B = np.eye(4).astype(int)
C = np.array([1, 0, 1, 0], dtype=bool)
goodies = [(a, b, c, d) for a, b, c, d in zip(index, A, B, C)]
dt = [('index', 'int'), ('two_floats', 'float', 2), 
      ('four_ints', 'int', 4), ('and_a_bool', 'bool')]
s = np.array(goodies, dtype=dt)

generates the minimal numpy structured array:

array([(0, [ 0.        ,  1.        ], [1, 0, 0, 0],  True),
       (1, [ 0.84147098,  0.54030231], [0, 1, 0, 0], False),
       (2, [ 0.90929743, -0.41614684], [0, 0, 1, 0],  True),
       (3, [ 0.14112001, -0.9899925 ], [0, 0, 0, 1], False)],
      dtype=[('index', '<i8'), ('two_floats', '<f8', (2,)), ('four_ints', '<i8', (4,)), ('and_a_bool', '?')])

I want to sort first by and_a_bool descending, then by the second column of two_floats ascending so that the output would then be

array([(2, [ 0.90929743, -0.41614684], [0, 0, 1, 0],  True),
       (0, [ 0.        ,  1.        ], [1, 0, 0, 0],  True),
       (3, [ 0.14112001, -0.9899925 ], [0, 0, 0, 1], False),
       (1, [ 0.84147098,  0.54030231], [0, 1, 0, 0], False)],
      dtype=[('index', '<i8'), ('two_floats', '<f8', (2,)), ('four_ints', '<i8', (4,)), ('and_a_bool', '?')])

np.lexsort was mentioned in this answer but I don't see how to apply that here.

I'm looking for something using existing numpy methods rather than specialized code. My arrays will not be very large so I don't have a strong preference for in-place sorting or generating a new array,


回答1:


Make a temp sorting array:

In [133]: temp=np.zeros(s.shape, dtype='bool,float')                                     
In [134]: temp['f0']=~s['and_a_bool']                                                    
In [135]: temp['f1']=s['two_floats'][:,1]                                                
In [136]: temp                                                                           
Out[136]: 
array([(False,  1.        ), ( True,  0.54030231), (False, -0.41614684),
       ( True, -0.9899925 )], dtype=[('f0', '?'), ('f1', '<f8')])

now argsort (don't need to specify order since I choose the temp fields in the desired order):

In [137]: np.argsort(temp)                                                               
Out[137]: array([2, 0, 3, 1])

and apply that sort to s:

In [138]: s[_137]                                                                        
Out[138]: 
array([(2, [ 0.90929743, -0.41614684], [0, 0, 1, 0],  True),
       (0, [ 0.        ,  1.        ], [1, 0, 0, 0],  True),
       (3, [ 0.14112001, -0.9899925 ], [0, 0, 0, 1], False),
       (1, [ 0.84147098,  0.54030231], [0, 1, 0, 0], False)],
      dtype=[('index', '<i8'), ('two_floats', '<f8', (2,)), ('four_ints', '<i8', (4,)), ('and_a_bool', '?')])


来源:https://stackoverflow.com/questions/61906820/numpy-structured-array-sorting-by-multiple-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!