Numpy: find the euclidean distance between two 3-D arrays

问题

Given, two 3-D arrays of dimensions (2,2,2):

A = [[[ 0,  0],
    [92, 92]],

   [[ 0, 92],
    [ 0, 92]]]

B = [[[ 0,  0],
    [92,  0]],

   [[ 0, 92],
    [92, 92]]]

How do you find the Euclidean distance for each vector in A and B efficiently?

I have tried for-loops but these are slow, and I'm working with 3-D arrays in the order of (>>2, >>2, 2).

Ultimately I want a matrix of the form:

C = [[d1, d2],
     [d3, d4]]

Edit:

I've tried the following loop, but the biggest issue with it is that loses the dimensions I want to keep. But the distances are correct.

[numpy.sqrt((A[row, col][0] - B[row, col][0])**2 + (B[row, col][1] -A[row, col][1])**2) for row in range(2) for col in range(2)]

回答1:

Thinking in a NumPy vectorized way that would be performing element-wise differentiation, squaring and summing along the last axis and finally getting square root. So, the straight-forward implementation would be -

np.sqrt(((A - B)**2).sum(-1))

We could perform the squaring and summing along the last axis in one go with np.einsum and thus make it more efficient, like so -

subs = A - B
out = np.sqrt(np.einsum('ijk,ijk->ij',subs,subs))

Another alternative with numexpr module -

import numexpr as ne
np.sqrt(ne.evaluate('sum((A-B)**2,2)'))

Since, we are working with a length of 2 along the last axis, we could just slice those and feed it to evaluate method. Please note that slicing isn't possible inside the evaluate string. So, the modified implementation would be -

a0 = A[...,0]
a1 = A[...,1]
b0 = B[...,0]
b1 = B[...,1]
out = ne.evaluate('sqrt((a0-b0)**2 + (a1-b1)**2)')

Runtime test

Function definitions -

def sqrt_sum_sq_based(A,B):
    return np.sqrt(((A - B)**2).sum(-1))

def einsum_based(A,B):
    subs = A - B
    return np.sqrt(np.einsum('ijk,ijk->ij',subs,subs))

def numexpr_based(A,B):
    return np.sqrt(ne.evaluate('sum((A-B)**2,2)'))

def numexpr_based_with_slicing(A,B):
    a0 = A[...,0]
    a1 = A[...,1]
    b0 = B[...,0]
    b1 = B[...,1]
    return ne.evaluate('sqrt((a0-b0)**2 + (a1-b1)**2)')

Timings -

In [288]: # Setup input arrays
     ...: dim = 2
     ...: N = 1000
     ...: A = np.random.rand(N,N,dim)
     ...: B = np.random.rand(N,N,dim)
     ...: 

In [289]: %timeit sqrt_sum_sq_based(A,B)
10 loops, best of 3: 40.9 ms per loop

In [290]: %timeit einsum_based(A,B)
10 loops, best of 3: 22.9 ms per loop

In [291]: %timeit numexpr_based(A,B)
10 loops, best of 3: 18.7 ms per loop

In [292]: %timeit numexpr_based_with_slicing(A,B)
100 loops, best of 3: 8.23 ms per loop

In [293]: %timeit np.linalg.norm(A-B, axis=-1) #@dnalow's soln
10 loops, best of 3: 45 ms per loop