After many attempts trying optimize code, it seems that one last resource would be to attempt to run the code below using multiple cores. I don\'t know exactly how to conver
sfstewman's excellent answer most likely solved the issue for you.
I'd just like to add how you can achieve the same exclusively in numpy.
I make use of numpy's unique an in1d functions.
B_unique_sorted, B_idx = np.unique(B, return_index=True)
B_in_A_bool = np.in1d(B_unique_sorted, A, assume_unique=True)
B_unique_sorted contains the unique values in B sorted.B_idx holds for these values the indices into the original B.B_in_A_bool is a boolean array the size of B_unique_sorted that
stores whether a value in B_unique_sorted is in A.B_idxA is already unique.Now you can use B_in_A_bool to either get the common vals
B_unique_sorted[B_in_A_bool]
and their respective indices in the original B
B_idx[B_in_A_bool]
Finally, I assume that this is significantly faster than the pure Python for-loop although I didn't test it.