Parallel construction of a distance matrix
问题 I work on hierarchical agglomerative clustering on large amounts of multidimensional vectors, and I noticed that the biggest bottleneck is the construction of the distance matrix. A naive implementation for this task is the following (here in Python): ''' v = an array (N,d), where rows are the observations and columns the dimensions''' def create_dist_matrix(v): N = v.shape[0] D = np.zeros((N,N)) for i in range(N): for j in range(i+1): D[i,j] = cosine(v[i,:],v[j,:]) # scipy.spatial.distance