i have a question on how to calculate distances in numpy as fast as it can,
def getR1(VVm,VVs,HHm,HHs):
t0=time.time()
R=VVs.flatten()[numpy.newaxis
Whenever you have multiplications and sums, try to use one of the dot product functions or np.einsum. Since you are preallocating your arrays, rather than having different arrays for horizontal and vertical coordinates, stack them both together:
precomputed_flat = np.column_stack((svf.flatten(), shf.flatten()))
measured_flat = np.column_stack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat - measured_flat[:, None, :]
From here, the simplest would be:
dist = np.einsum('ijk,ijk->ij', deltas, deltas)
You could also try something like:
from numpy.core.umath_tests import inner1d
dist = inner1d(deltas, deltas)
There is of course also SciPy's spatial module cdist:
from scipy.spatial.distance import cdist
dist = cdist(precomputed_flat, measured_flat, 'euclidean')
EDIT I cannot run tests on such a large dataset, but these timings are rather enlightening:
len_a, len_b = 10000, 1000
a = np.random.rand(2, len_a)
b = np.random.rand(2, len_b)
c = np.random.rand(len_a, 2)
d = np.random.rand(len_b, 2)
In [3]: %timeit a[:, None, :] - b[..., None]
10 loops, best of 3: 76.7 ms per loop
In [4]: %timeit c[:, None, :] - d
1 loops, best of 3: 221 ms per loop
For the above smaller dataset, I can get a slight speed up over your method with scipy.spatial.distance.cdist and match it with inner1d, by arranging data differently in memory:
precomputed_flat = np.vstack((svf.flatten(), shf.flatten()))
measured_flat = np.vstack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat[:, None, :] - measured_flat
import scipy.spatial.distance as spdist
from numpy.core.umath_tests import inner1d
In [13]: %timeit r0 = a[0, None, :] - b[0, :, None]; r1 = a[1, None, :] - b[1, :, None]; r0 *= r0; r1 *= r1; r0 += r1
10 loops, best of 3: 146 ms per loop
In [14]: %timeit deltas = (a[:, None, :] - b[..., None]).T; inner1d(deltas, deltas)
10 loops, best of 3: 145 ms per loop
In [15]: %timeit spdist.cdist(a.T, b.T)
10 loops, best of 3: 124 ms per loop
In [16]: %timeit deltas = a[:, None, :] - b[..., None]; np.einsum('ijk,ijk->jk', deltas, deltas)
10 loops, best of 3: 163 ms per loop