Python: remove duplicates from a multi-dimensional array

In Python numpy.unique can remove all duplicates from a 1D array, very efficiently.

1) How about to remove duplicate rows or columns in a 2D array?

2) How about for nD arrays?

If possible I would use pandas.

In [1]: from pandas import *

In [2]: import numpy as np

In [3]: a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])

In [4]: DataFrame(a).drop_duplicates().values
Out[4]: 
array([[1, 1],
       [2, 3],
       [5, 4]], dtype=int64)

The following is another approach which performs much better than for loop. 2s for 10k+100 duplicates.

def tuples(A):
    try: return tuple(tuples(a) for a in A)
    except TypeError: return A

b = set(tuples(a))

The idea inspired by Waleed Khan's first part. So no need for any additional package that is may have further applications. It is also super Pythonic, I guess.

The numpy_indexed package solves this problem for the n-dimensional case. (disclaimer: I am its author). Infact, solving this problem was the motivation for starting this package; but it has grown to include a lot of related functionality.

import numpy_indexed as npi
a = np.random.randint(0, 2, (3, 3, 3))
print(npi.unique(a))
print(npi.unique(a, axis=1))
print(npi.unique(a, axis=2))

来源：https://stackoverflow.com/questions/14089453/python-remove-duplicates-from-a-multi-dimensional-array

标签

python

arrays

multidimensional-array

numpy

duplicates

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!