I have a set of integers, say S = {1,...,10}, and two matrices N and M, whose rows are some (but not necessarily all possible) permutations of elements from S of orders, say
I benchmarked all the approaches against different pairs of matrices N,M
, and wherever possible, I have also compared parfor
vs. for
and picked the faster one. Here are my results:
%//Test 1: size(N) = 2263x3, size(M) = 5000x6
%//My approach (parfor): 0.650626 sec
%//Divakar's Approach 1: 1.870144 sec
%//Divakar's Approach 2: 1.164088 sec
%//Divakar's Approach 3: 0.380915 sec (with parfor)
%//Divakar's Approach 4: 2.643659 sec (gpu)
%//Luis Mendo's Approach: 1.681007 sec
%//Test 2: size(N) = 2263x3, size(M) = 25000x6
%//My approach (parfor): 6.137823 sec
%//Divakar's Approach 1: 8.342699 sec
%//Divakar's Approach 2: 5.784426 sec
%//Divakar's Approach 3: 2.251888 sec (with parfor)
%//Divakar's Approach 4: 6.272578 sec (gpu)
%//Luis Mendo's Approach: 11.514548 sec
%//Test 3: size(N) = 2100x3, size(M) = 20000x5
%//My approach (parfor): 3.417432 sec
%//Divakar's Approach 1: 5.732680 sec
%//Divakar's Approach 2: 4.107909 sec
%//Divakar's Approach 3: 1.393052 sec (with parfor)
%//Divakar's Approach 4: 3.145183 sec (gpu)
%//Luis Mendo's Approach: 5.668326 sec
%//Test 4: size(N) = 2100x3, size(M) = 324632x5
%//Divakar's Approach 3: 54.231878 sec (with parfor)
%//Divakar's Approach 4: 15.111936 sec (gpu)
%//Test 5: size(N) = 2263x3, size(M) = 1000000x6
%//Divakar's Approach 3: 210.853515 sec (with parfor)
%//Divakar's Approach 4: 49.529794 sec (gpu)
%//Divakar's Approach 5: 49.874444 sec (gpu)
%//Test 6: size(N) = 2263x3, size(M) = 5000000x6
%//Divakar's Approach 3: 1137.606244 sec (with parfor)
%//Divakar's Approach 4: stopped it after 15 min and heavy interrupts/DCPs activity
%//Divakar's Approach 5: 267.169307 sec
Among the non-gpu approaches, Divakar's Approach 3 was by far the fastest one. Its gpu counterpart started showing its advantages only with large number of rows.