Parallelize or vectorize all-against-all operation on a large number of matrices?

问题

I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm.

In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so

list = who('data_matrix_prefix*')
H = cell(numel(list),numel(list));  
for i=1:numel(list)
    for j=1:numel(list)
        if i ~= j
            eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']);
        end
    end
end

is fast for small subsets of the data (e.g. for 9 matrices, 9*9 - 9 = 72 calls are made in ~1 s, 870 calls in ~2.5 s).
However, operating on all the data requires almost 25 million calls.
I have also tried using deal() to make a cell array composed entirely of the next element in data, so I could use cellfun() in a single loop:

# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
nextData = cell(k,1);
for i=1:k
    [nextData{:}] = deal(data{i});
    H{:,i} = cellfun(@compare,data,nextData,'UniformOutput',false);
end

Unfortunately, this is not really any faster, because all the time is in compare(). Both of these code examples seem ill-suited for parallelization. I'm having trouble figuring out how to make my variables sliced.
compare() is totally vectorized; it uses matrix multiplication and conv2() exclusively (I am under the impression that all of these operations, including the cellfun(), should be multithreaded in MATLAB?).

Does anyone see a (explicitly) parallelized solution or better vectorization of the problem?

Note
I realize both my examples are inefficient - the first would be twice as fast if it calculated a triangular cell array, and the second is still calculating the self comparisons, as well. But the time savings for a good parallelization are more like a factor of 16 (or 72 if I install MATLAB on everyone's machines).

Aside
There is also a memory issue. I used a couple of evals to append each column of H into a file, with names like H1, H2, etc. and then clear H_i. Unfortunately, the saves are very slow...

回答1:

The second example can be easily sliced for use with the Parallel Processing Toolbox. This toolbox distributes iterations of your code among up to 8 different local processors. If you want to run the code on a cluster, you also need the Distributed Computing Toolbox.

%# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.

parfor i=1:k-1 %# this will run the loop in parallel with the parallel processing toolbox
    %# only make the necessary comparisons
    H{i+1:k,i} = cellfun(@compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);

    %# if the above doesn't work, try this
    hSlice = cell(k,1);
    hSlice{i+1:k} = cellfun(@compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
    H{:,i} = hSlice;
end

回答2:

Does

compare(a,b) == compare(b,a)

and

compare(a,a) == 1

If so, change your loop

for i=1:numel(list)
    for j=1:numel(list)
    ...
    end
end

for i=1:numel(list)
    for j= i+1 : numel(list)
    ...
    end
end

and deal with the symmetry and identity case. This will cut your calculation time by half.

回答3:

If I understand correctly you have to perform 5000^2 matrix comparisons ? Rather than try to parallelise the compare function, perhaps you should think of your problem being composed of 5000^2 tasks ? The Matlab Parallel Compute Toolbox supports task-based parallelism. Unfortunately my experience with PCT is with parallelisation of large linear algebra type problems so I can't really tell you much more than that. The documentation will undoubtedly help you more.

来源：https://stackoverflow.com/questions/2872249/parallelize-or-vectorize-all-against-all-operation-on-a-large-number-of-matrices

标签

matlab

dynamic-programming

vectorization

parallel-processing