Finding unique points in numpy array

后端未结

关注

 2  806

What is a faster way of finding unique x,y points (removing duplicates) in a numpy array like:

points = numpy.random.randint(0, 5, (10,2))

相关标签:

2条回答

既然无缘

2020-12-16 03:55

I think you have a very good idea here. Think about the underlying block of memory used to represent the data in points. We tell numpy to regard that block as representing an array of shape (10,2) with dtype int32 (32-bit integers), but it is almost costless to tell numpy to regard that same block of memory as representing an array of shape (10,) with dtype c8 (64-bit complex).

So the only real cost is calling np.unique, followed by another virtually costless call to view and reshape:

import numpy as np
np.random.seed(1)
points = np.random.randint(0, 5, (10,2))
print(points)
print(len(points))

yields

[[3 4]
 [0 1]
 [3 0]
 [0 1]
 [4 4]
 [1 2]
 [4 2]
 [4 3]
 [4 2]
 [4 2]]
10

while

cpoints = points.view('c8')
cpoints = np.unique(cpoints)
points = cpoints.view('i4').reshape((-1,2))
print(points)
print(len(points))

yields

[[0 1]
 [1 2]
 [3 0]
 [3 4]
 [4 2]
 [4 3]
 [4 4]]
7

If you don't need the result to be sorted, wim's method is faster (You might want to consider accepting his answer...)

import numpy as np
np.random.seed(1)
N=10000
points = np.random.randint(0, 5, (N,2))

def using_unique():
    cpoints = points.view('c8')
    cpoints = np.unique(cpoints)
    return cpoints.view('i4').reshape((-1,2))

def using_set():
    return np.vstack([np.array(u) for u in set([tuple(p) for p in points])])

yields these benchmarks:

% python -mtimeit -s'import test' 'test.using_set()'
100 loops, best of 3: 18.3 msec per loop
% python -mtimeit -s'import test' 'test.using_unique()'
10 loops, best of 3: 40.6 msec per loop

0 讨论(0)

鱼传尺愫

2020-12-16 04:03

I would do it like this:

numpy.array(list(set(tuple(p) for p in points)))

For the fast solution in the most general case, maybe this recipe would interest you: http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/

0 讨论(0)
发布评论:

提交评论
- 加载中...