问题
I am looking at the joblib examples but I can't figure out how to do a parallel for loop over a matrix. I am computing a pairwise distance metric between the rows of a matrix. So I was doing:
N, _ = data.shape
upper_triangle = [(i, j) for i in range(N) for j in range(i + 1, N)]
dist_mat = np.zeros((N,N))
for (i, j) in upper_triangle:
dist_mat[i,j] = dist_fun(data[i], data[j])
dist_mat[j,i] = dist_mat[i,j]
where dist_fun
takes two vectors and computes a distance. How can I make this loop parallel since calls to dist_fun
can be made independent of each other.
EDIT: The distance function I am using is fastdtw which is not so fast. So I think really do want to parallelize this. Using:
dist_mat = pdist(data, lambda x,y : fastdtw(x,y, dist=euclidean)[0])
I get an execution time of 58.1084 secs, and using:
dist_mat = np.zeros((N,N))
for (i,j), _ in np.ndenumerate(dist_mat):
dist_mat[i,j], _ = fastdtw(data[i,:], timeseries[j,:], dist=euclidean)
I get 116.36 seconds and using:
upper_triangle = [(i,j) for i in range(N) for j in range(i+1, N)]
dist_mat = np.zeros((N,N))
for (i,j) in upper_triangle:
dist_mat[i,j], _ = fastdtw(data[i,:], data[j,:], dist=euclidean)
dist_mat[j,i] = dist_mat[i,j]
I get 55.62 secs. Here N=33
. Does scipy
automatically makes use of all available cores?
EDIT: I think I have found a work around using the multiprocessing package, but I will leave the question un-answered for the joblib folks to respond before I post what I think works.
回答1:
This can be done as follows using the multiprocessing module:
import numpy as np
from fastdtw import fastdtw
import multiprocessing as mp
from scipy.spatial.distance import squareform, euclidean
from functools import partial
# Create simulated data matrix
data = np.random.random((33,300))
N, _ = data.shape
upper_triangle = [(i,j) for i in range(N) for j in range(i+1, N)]
with mp.Pool(processes=4) as pool:
result = pool.starmap(partial(fastdtw, dist=euclidean), [(data[i], data[j]) for (i,j) in upper_triangle])
dist_mat = squareform([item[0] for item in result])
Timing result using timeit on an IvyBridge Core-i5:
24.052 secs
which is half the time without explicit parallelization.
ALSO:
As a future reference for anyone using the fastdtw package. Importing the distance functions from scipy.spatial.distance
and calling fastdtw
as shown in the example on the link is much slower than just using: fastdtw(x,y,dist=2)
. The results seem similar to me and the execution time using pdist
(without resorting to parallelization) is under a second.
来源:https://stackoverflow.com/questions/51350640/parallel-for-loop-over-numpy-matrix