For a set of observations:
[a1,a2,a3,a4,a5]
their pairwise distances
d=[[0,a12,a13,a14,a15]
[a21,0,a23,a24,a25]
[a31,
To improve the efficiency using numpy.triu_indices
use this:
def PdistIndices(n,I):
'''idx = {} indices for pdist results'''
idx = numpy.array(numpy.triu_indices(n,1)).T[I]
return idx
So I is an array of indices.
However a better solution is to implement an optimized Brute-force search, say, in Fortran:
function PdistIndices(n,indices,m) result(IJ)
!IJ = {} indices for pdist[python] selected results[indices]
implicit none
integer:: i,j,m,n,k,w,indices(0:m-1),IJ(0:m-1,2)
logical:: finished
k = 0; w = 0; finished = .false.
do i=0,n-2
do j=i+1,n-1
if (k==indices(w)) then
IJ(w,:) = [i,j]
w = w+1
if (w==m) then
finished = .true.
exit
endif
endif
k = k+1
enddo
if (finished) then
exit
endif
enddo
end function
then compile using F2PY and enjoy unbeatable performance. ;)