argsort for a multidimensional ndarray

后端 未结 4 452
深忆病人
深忆病人 2020-12-03 06:21

I\'m trying to get the indices to sort a multidimensional array by the last axis, e.g.

>>> a = np.array([[3,1,2],[8,9,2]])

And I\'

相关标签:
4条回答
  • 2020-12-03 06:25

    The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:

    >>> a = np.array([[3,1,2],[8,9,2]])
    >>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
    array([[1 2 3]
           [2 8 9]])
    
    0 讨论(0)
  • 2020-12-03 06:35

    You can also use linear indexing, which might be better with performance, like so -

    M,N = a.shape
    out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
    

    So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.

    Sample run -

    In [23]: a = np.array([[3,1,2],[8,9,2]])
    
    In [24]: b = np.array([[0,5,4],[3,9,1]])
    
    In [25]: M,N = a.shape
    
    In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
    Out[26]: 
    array([[5, 4, 0],
           [1, 3, 9]])
    

    Rumtime tests -

    In [27]: a = np.random.rand(1000,1000)
    
    In [28]: b = np.random.rand(1000,1000)
    
    In [29]: M,N = a.shape
    
    In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
    10 loops, best of 3: 133 ms per loop
    
    In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
    10 loops, best of 3: 96.7 ms per loop
    
    0 讨论(0)
  • 2020-12-03 06:47

    I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...

    >>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
    array([[1, 2, 3],
           [2, 8, 9]])
    
    0 讨论(0)
  • 2020-12-03 06:48

    Solution:

    >>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
    array([[1, 2, 3],
           [2, 8, 9]])
    

    You got it right, though I wouldn't describe it as cheating the indexing.

    Maybe this will help make it clearer:

    In [544]: i=np.argsort(a,axis=1)
    
    In [545]: i
    Out[545]: 
    array([[1, 2, 0],
           [2, 0, 1]])
    

    i is the order that we want, for each row. That is:

    In [546]: a[0, i[0,:]]
    Out[546]: array([1, 2, 3])
    
    In [547]: a[1, i[1,:]]
    Out[547]: array([2, 8, 9])
    

    To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.

    In [548]: a[[[0],[1]],i]
    Out[548]: 
    array([[1, 2, 3],
           [2, 8, 9]])
    

    Another array that could be paired with i is:

    In [560]: j=np.array([[0,0,0],[1,1,1]])
    
    In [561]: j
    Out[561]: 
    array([[0, 0, 0],
           [1, 1, 1]])
    
    In [562]: a[j,i]
    Out[562]: 
    array([[1, 2, 3],
           [2, 8, 9]])
    

    If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.

    I think of

    np.array([[0],
              [1]])
    

    as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.

    The full mapping from a to the new array is:

    [a[0,1]  a[0,2]  a[0,0]
     a[1,2]  a[1,0]  a[1,1]]
    

    def foo(a):
        i = np.argsort(a, axis=1)
        return (np.arange(a.shape[0])[:,None], i)
    
    In [61]: foo(a)
    Out[61]: 
    (array([[0],
            [1]]), array([[1, 2, 0],
            [2, 0, 1]], dtype=int32))
    In [62]: a[foo(a)]
    Out[62]: 
    array([[1, 2, 3],
           [2, 8, 9]])
    
    0 讨论(0)
提交回复
热议问题