Find unique rows in numpy.array

后端 未结 20 3119
独厮守ぢ
独厮守ぢ 2020-11-21 10:57

I need to find unique rows in a numpy.array.

For example:

>>> a # I have
array([[1, 1, 1, 0, 0, 0],
       [0, 1, 1, 1, 0, 0],
         


        
20条回答
  •  耶瑟儿~
    2020-11-21 11:55

    Another option to the use of structured arrays is using a view of a void type that joins the whole row into a single item:

    a = np.array([[1, 1, 1, 0, 0, 0],
                  [0, 1, 1, 1, 0, 0],
                  [0, 1, 1, 1, 0, 0],
                  [1, 1, 1, 0, 0, 0],
                  [1, 1, 1, 1, 1, 0]])
    
    b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
    _, idx = np.unique(b, return_index=True)
    
    unique_a = a[idx]
    
    >>> unique_a
    array([[0, 1, 1, 1, 0, 0],
           [1, 1, 1, 0, 0, 0],
           [1, 1, 1, 1, 1, 0]])
    

    EDIT Added np.ascontiguousarray following @seberg's recommendation. This will slow the method down if the array is not already contiguous.

    EDIT The above can be slightly sped up, perhaps at the cost of clarity, by doing:

    unique_a = np.unique(b).view(a.dtype).reshape(-1, a.shape[1])
    

    Also, at least on my system, performance wise it is on par, or even better, than the lexsort method:

    a = np.random.randint(2, size=(10000, 6))
    
    %timeit np.unique(a.view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))).view(a.dtype).reshape(-1, a.shape[1])
    100 loops, best of 3: 3.17 ms per loop
    
    %timeit ind = np.lexsort(a.T); a[np.concatenate(([True],np.any(a[ind[1:]]!=a[ind[:-1]],axis=1)))]
    100 loops, best of 3: 5.93 ms per loop
    
    a = np.random.randint(2, size=(10000, 100))
    
    %timeit np.unique(a.view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))).view(a.dtype).reshape(-1, a.shape[1])
    10 loops, best of 3: 29.9 ms per loop
    
    %timeit ind = np.lexsort(a.T); a[np.concatenate(([True],np.any(a[ind[1:]]!=a[ind[:-1]],axis=1)))]
    10 loops, best of 3: 116 ms per loop
    

提交回复
热议问题