I am finding that a lot of time spent in my matlab function is in this code:
intersect(freq_bins, our_bins);
Both can be rather large vectors, and are comprised of only integers. I just need to know which integers are in both. This is truly the primitive purpose of intersect(), so I suspect that the answer is: it doesn't get any better. But maybe someone has some suggestions.
intersect
calls ismember
. In your case, you don't need all the complicated checks that intersect
does, so you can save some overhead and call ismember
directly (note: I made sure to call both functions before timing them):
a = randi(1000,100,1);
b = randi(1000,100,1);
>> tic,intersect(a,b),toc
ans =
76
338
490
548
550
801
914
930
Elapsed time is 0.027104 seconds.
>> tic,a(ismember(a,b)),toc
ans =
914
801
490
548
930
550
76
338
Elapsed time is 0.000613 seconds.
You can make this even faster by calling ismembc
, the function that does the actual testing, directly. Note that ismembc
requires sorted arrays (so you can drop the sort if your input is sorted already!)
tic,a=sort(a);b=sort(b);a(ismembc(a,b)),toc
ans =
76
338
490
548
550
801
914
930
Elapsed time is 0.000473 seconds.
If you can assume that your inputs contain sorted lists of unique integers, then you can do this in linear time with a very simple algorithm:
function c = intersect_sorted(a,b)
ia = 1;
na = length(a);
ib = 1;
nb = length(b);
ic = 0;
cn = min(na,nb);
c = zeros(1,cn);
while (ia <= na && ib <= nb)
if (a(ia) > b(ib))
ib = ib + 1;
elseif a(ia) < b(ib)
ia = ia + 1;
else % a(ia) == b(ib)
ic = ic + 1;
c(ic) = a(ia);
ib = ib + 1;
ia = ia + 1;
end
end
c = c(1:ic);
end
The max runtime for lists of length n and m will be O(n+m).
>>a = unique(randi(1000,100,1));
>>b = unique(randi(1000,100,1));
>>tic;for i = 1:10000, intersect(a,b); end,toc
Elapsed time is 1.224514 seconds.
>> tic;for i = 1:10000, intersect_sorted(a,b); end,toc
Elapsed time is 0.289075 seconds.
来源:https://stackoverflow.com/questions/8159449/a-faster-way-to-achieve-what-intersect-is-giving-me