I have two big arrays to work on. But let\'s take a look on the following simplified example to get the idea:
I would like to find if an element in data1
Because your data is all integers, you can use a dictionary (hash table), time is 0.55 seconds for the same data as in Paul's answer. This won't necessarily find all copies of pairings between a
and b
(i.e. if a
and b
themselves contain duplicates), but it's easy enough to modify this to do that or to make a second pass afterward (over just the matched items) to check for other occurrences of those vectors in the data.
import numpy as np
def intersect1(a, b):
a_d = {}
for i, x in enumerate(a):
a_d[x] = i
for i, y in enumerate(b):
if y in a_d:
yield a_d[y], i
from time import perf_counter
a = list(tuple(x) for x in list(np.random.randint(0, 100000, (1000000, 2))))
b = list(tuple(x) for x in list(np.random.randint(0, 100000, (1000000, 2))))
t = perf_counter(); print(list(intersect1(a, b))); s = perf_counter()
print(s-t)
For comparison, Paul's takes 2.46s on my machine.