efficient loop over numpy array

前端未结

关注

 8  1637

轮回少年 2021-01-06 02:05

Versions of this question have already been asked but I have not found a satisfactory answer.

Problem: given a large numpy vector, find indices of t

8条回答

遥遥无期 (楼主)

2021-01-06 02:22

As an alternative to Ami Tavory's answer, you can use a Counter from the collections package to detect duplicates. On my computer it seems to be even faster. See the function below which can also find different duplicates.

import collections
import numpy as np

def find_duplicates_original(x):
    d = []
    for i in range(len(x)):
        for j in range(i + 1, len(x)):
            if x[i] == x[j]:
                d.append(j)
    return d

def find_duplicates_outer(x):
    a = np.equal.outer(x, x)
    np.fill_diagonal(a, 0)
    return np.flatnonzero(np.any(a, axis=0))

def find_duplicates_counter(x):
    counter = collections.Counter(x)
    values = (v for v, c in counter.items() if c > 1)
    return {v: np.flatnonzero(x == v) for v in values}


n = 10000
x = np.arange(float(n))
x[n // 2] = 1
x[n // 4] = 1

>>>> find_duplicates_counter(x)
{1.0: array([   1, 2500, 5000], dtype=int64)}

>>>> %timeit find_duplicates_original(x)
1 loop, best of 3: 12 s per loop

>>>> %timeit find_duplicates_outer(x)
10 loops, best of 3: 84.3 ms per loop

>>>> %timeit find_duplicates_counter(x)
1000 loops, best of 3: 1.63 ms per loop

0 讨论(0)

查看其它8个回答