I have the following code which is the bottleneck in one part of my application. All I do is subtract on Array from another. Both of these arrays have more around 100000 element
Running subtraction on more threads sounds good, but 100K integer sunstraction don't take a lot of CPU time, so maybe threadpool... However settings threads have also a lot of overhead, so short arrays will have slower productivity in parallel threads than in only one (main) thread!
Did you switch off in compiler settings, overflow and range checking?
You can try to use asm rutine, it is very simple...
Something like:
procedure SubArray(var ar1, ar2; length: integer);
asm
//length must be > than 0!
push ebx
lea ar1, ar1 -4
lea ar2, ar2 -4
@Loop:
mov ebx, [ar2 + length *4]
sub [ar1 + length *4], ebx
dec length
//Here you can put more folloving parts of rutine to more unrole it to speed up.
jz @exit
mov ebx, [ar2 + length *4]
sub [ar1 + length *4], ebx
dec length
//
jnz @Loop
@exit:
pop ebx
end;
begin
SubArray(Array1[0], Array2[0], length(Array1));
It can be much faster...
EDIT: Added procedure with SIMD instructions.
This procedure request SSE CPU support. It can take 4 integers in XMM register and subtract at once. There is also possibility to use movdqa instead movdqu it is faster, but you must first to ensure 16 byte aligment. You can also unrole the XMM par like in my first asm case. (I'm interesting about speed measurment. :) )
var
array1, array2: array of integer;
procedure SubQIntArray(var ar1, ar2; length: integer);
asm
//prepare length if not rounded to 4
push ecx
shr length, 2
jz @LengthToSmall
@Loop:
movdqu xmm1, [ar1] //or movdqa but ensure 16b aligment first
movdqu xmm2, [ar2] //or movdqa but ensure 16b aligment first
psubd xmm1, xmm2
movdqu [ar1], xmm1 //or movdqa but ensure 16b aligment first
add ar1, 16
add ar2, 16
dec length
jnz @Loop
@LengthToSmall:
pop ecx
push ebx
and ecx, 3
jz @Exit
mov ebx, [ar2]
sub [ar1], ebx
dec ecx
jz @Exit
mov ebx, [ar2 + 4]
sub [ar1 + 4], ebx
dec ecx
jz @Exit
mov ebx, [ar2 + 8]
sub [ar1 + 8], ebx
@Exit:
pop ebx
end;
begin
//Fill arrays first!
SubQIntArray(Array1[0], Array2[0], length(Array1));