I have an array of N points in d dimensions (N, d) and I\'d like to make a new array of all the displacement vectors for each pair <
Straight forward would be
dis_vectors = [l - r for l, r in itertools.combinations(points, 2)]
but I doubt that it is fast. Actually %timeit says:
For 3 points:
list : 13 us
pdist: 24 us
But already for 27 points:
list : 798 us
pdist: 35.2 us
About how many points are we talking here?
Another possibility something like
import numpy
from operator import mul
from fractions import Fraction
def binomial_coefficient(n,k):
# credit to http://stackoverflow.com/users/226086/nas-banov
return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )
def pairwise_displacements(a):
n = a.shape[0]
d = a.shape[1]
c = binomial_coefficient(n, 2)
out = numpy.zeros( (c, d) )
l = 0
r = l + n - 1
for sl in range(1, n): # no point1 - point1!
out[l:r] = a[:n-sl] - a[sl:]
l = r
r += n - (sl + 1)
return out
This simply "slides" the array against itself over all dimensions and performs a (broadcastable) subtraction in each step. Note that no repetition is considered and no equal pairs (e.g. point1 - point1).
This function still performs well in the 1000 points range with 31.3ms, whereas pdist is still faster with 20.7 ms and the list comprehension takes the third place with 1.23 s.