Parallel for loop over numpy matrix

问题

I am looking at the joblib examples but I can't figure out how to do a parallel for loop over a matrix. I am computing a pairwise distance metric between the rows of a matrix. So I was doing:

N, _ = data.shape
upper_triangle = [(i, j) for i in range(N) for j in range(i + 1, N)]
dist_mat = np.zeros((N,N))  

for (i, j) in upper_triangle:
    dist_mat[i,j] = dist_fun(data[i], data[j])
    dist_mat[j,i] = dist_mat[i,j]

where dist_fun takes two vectors and computes a distance. How can I make this loop parallel since calls to dist_fun can be made independent of each other.

EDIT: The distance function I am using is fastdtw which is not so fast. So I think really do want to parallelize this. Using:

dist_mat = pdist(data, lambda x,y : fastdtw(x,y, dist=euclidean)[0])

I get an execution time of 58.1084 secs, and using:

dist_mat = np.zeros((N,N))
for (i,j), _ in np.ndenumerate(dist_mat):
    dist_mat[i,j], _ = fastdtw(data[i,:], timeseries[j,:], dist=euclidean)

I get 116.36 seconds and using:

upper_triangle = [(i,j) for i in range(N) for j in range(i+1, N)]
dist_mat = np.zeros((N,N))

for (i,j) in upper_triangle:
    dist_mat[i,j], _ = fastdtw(data[i,:], data[j,:], dist=euclidean)
    dist_mat[j,i] = dist_mat[i,j]

I get 55.62 secs. Here N=33. Does scipy automatically makes use of all available cores?

EDIT: I think I have found a work around using the multiprocessing package, but I will leave the question un-answered for the joblib folks to respond before I post what I think works.

回答1:

This can be done as follows using the multiprocessing module:

import numpy as np
from fastdtw import fastdtw
import multiprocessing as mp
from scipy.spatial.distance import squareform, euclidean
from functools import partial

# Create simulated data matrix
data = np.random.random((33,300))

N, _ = data.shape
upper_triangle = [(i,j) for i in range(N) for j in range(i+1, N)]

with mp.Pool(processes=4) as pool:
    result = pool.starmap(partial(fastdtw, dist=euclidean), [(data[i], data[j]) for (i,j) in upper_triangle])

dist_mat = squareform([item[0] for item in result])

Timing result using timeit on an IvyBridge Core-i5:

24.052 secs

which is half the time without explicit parallelization.

ALSO:

As a future reference for anyone using the fastdtw package. Importing the distance functions from scipy.spatial.distance and calling fastdtw as shown in the example on the link is much slower than just using: fastdtw(x,y,dist=2). The results seem similar to me and the execution time using pdist (without resorting to parallelization) is under a second.

来源：https://stackoverflow.com/questions/51350640/parallel-for-loop-over-numpy-matrix

标签

python-3.x

parallel-processing

python-multiprocessing

joblib

dtw