Python: remove duplicates from a multi-dimensional array

后端 未结 3 903
日久生厌
日久生厌 2020-12-18 11:46

In Python numpy.unique can remove all duplicates from a 1D array, very efficiently.

1) How about to remove duplicate rows or columns in a 2D a

相关标签:
3条回答
  • 2020-12-18 12:41

    If possible I would use pandas.

    In [1]: from pandas import *
    
    In [2]: import numpy as np
    
    In [3]: a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
    
    In [4]: DataFrame(a).drop_duplicates().values
    Out[4]: 
    array([[1, 1],
           [2, 3],
           [5, 4]], dtype=int64)
    
    0 讨论(0)
  • 2020-12-18 12:46

    The numpy_indexed package solves this problem for the n-dimensional case. (disclaimer: I am its author). Infact, solving this problem was the motivation for starting this package; but it has grown to include a lot of related functionality.

    import numpy_indexed as npi
    a = np.random.randint(0, 2, (3, 3, 3))
    print(npi.unique(a))
    print(npi.unique(a, axis=1))
    print(npi.unique(a, axis=2))
    
    0 讨论(0)
  • 2020-12-18 12:48

    The following is another approach which performs much better than for loop. 2s for 10k+100 duplicates.

    def tuples(A):
        try: return tuple(tuples(a) for a in A)
        except TypeError: return A
    
    b = set(tuples(a))
    

    The idea inspired by Waleed Khan's first part. So no need for any additional package that is may have further applications. It is also super Pythonic, I guess.

    0 讨论(0)
提交回复
热议问题