Python: remove duplicates from a multi-dimensional array

ⅰ亾dé卋堺 提交于 2020-01-20 05:20:34

问题


In Python numpy.unique can remove all duplicates from a 1D array, very efficiently.

1) How about to remove duplicate rows or columns in a 2D array?

2) How about for nD arrays?


回答1:


If possible I would use pandas.

In [1]: from pandas import *

In [2]: import numpy as np

In [3]: a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])

In [4]: DataFrame(a).drop_duplicates().values
Out[4]: 
array([[1, 1],
       [2, 3],
       [5, 4]], dtype=int64)



回答2:


The following is another approach which performs much better than for loop. 2s for 10k+100 duplicates.

def tuples(A):
    try: return tuple(tuples(a) for a in A)
    except TypeError: return A

b = set(tuples(a))

The idea inspired by Waleed Khan's first part. So no need for any additional package that is may have further applications. It is also super Pythonic, I guess.




回答3:


The numpy_indexed package solves this problem for the n-dimensional case. (disclaimer: I am its author). Infact, solving this problem was the motivation for starting this package; but it has grown to include a lot of related functionality.

import numpy_indexed as npi
a = np.random.randint(0, 2, (3, 3, 3))
print(npi.unique(a))
print(npi.unique(a, axis=1))
print(npi.unique(a, axis=2))


来源:https://stackoverflow.com/questions/14089453/python-remove-duplicates-from-a-multi-dimensional-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!