问题
I have a code that generates all of the possible combinations of 4 integers between 0 and 36.
This will be 37^4 numbers = 1874161.
My code is written in MATLAB:
i=0;
for a = 0:36
for b= 0:36
for c = 0:36
for d = 0:36
i=i+1;
combination(i,:) = [a,b,c,d];
end
end
end
end
I've tested this with using the number 3
instead of the number 36
and it worked fine.
If there are 1874161 combinations, and with An overly cautions guess of 100 clock cycles to do the additions and write the values, then if I have a 2.3GHz PC, this is:
1874161 * (1/2300000000) * 100 = 0.08148526086
A fraction of a second. But It has been running for about half an hour so far.
I did receive a warning that combination changes size every loop iteration, consider predefining its size for speed
, but this can't effect it that much can it?
回答1:
As @horchler suggested you need to preallocate the target array
This is because your program is not O(N^4)
without preallocation. Each time you add new line to array it need to be resized, so new bigger array is created (as matlab do not know how big array it will be it probably increase only by 1 item) and then old array is copied into it and lastly old array is deleted. So when you have 10 items in array and adding 11th, then a copying of 10 items is added to iteration ... if I am not mistaken that leads to something like O(N^12)
which is massively more huge
- estimated as
(N^4)*(1+2+3+...+N^4)=((N^4)^3)/2
Also the reallocation process is increasing in size breaching CACHE barriers slowing down even more with increasing i
above each CACHE size barrier.
The only solution to this without preallocation is to store the result in linked list
Not sure Matlab has this option but that will need one/two pointer per item (32/64 bit value) which renders your array 2+
times bigger.
If you need even more speed then there are ways (probably not for Matlab):
- use multi-threading for array filling is fully parallelisable
- use memory block copy (
rep movsd
) or DMA the data is periodically repeating - You can also consider to compute the value from i on the run instead of remember the whole array, depending on the usage it can be faster in some cases...
来源:https://stackoverflow.com/questions/30388545/why-are-my-nested-for-loops-taking-so-long-to-compute