Most performant way to subtract one array from another

后端 未结 5 1027
别跟我提以往
别跟我提以往 2021-02-08 23:32

I have the following code which is the bottleneck in one part of my application. All I do is subtract on Array from another. Both of these arrays have more around 100000 element

5条回答
  •  我寻月下人不归
    2021-02-08 23:42

    Running subtraction on more threads sounds good, but 100K integer sunstraction don't take a lot of CPU time, so maybe threadpool... However settings threads have also a lot of overhead, so short arrays will have slower productivity in parallel threads than in only one (main) thread!

    Did you switch off in compiler settings, overflow and range checking?

    You can try to use asm rutine, it is very simple...

    Something like:

    procedure SubArray(var ar1, ar2; length: integer);
    asm
    //length must be > than 0!
       push ebx
       lea  ar1, ar1 -4
       lea  ar2, ar2 -4
    @Loop:
       mov  ebx, [ar2 + length *4]
       sub  [ar1 + length *4], ebx
       dec  length
    //Here you can put more folloving parts of rutine to more unrole it to speed up.
       jz   @exit
       mov  ebx, [ar2 + length *4]
       sub  [ar1 + length *4], ebx
       dec  length
    //
       jnz  @Loop
    @exit:
       pop  ebx
    end;
    
    
    begin
       SubArray(Array1[0], Array2[0], length(Array1));
    

    It can be much faster...

    EDIT: Added procedure with SIMD instructions. This procedure request SSE CPU support. It can take 4 integers in XMM register and subtract at once. There is also possibility to use movdqa instead movdqu it is faster, but you must first to ensure 16 byte aligment. You can also unrole the XMM par like in my first asm case. (I'm interesting about speed measurment. :) )

    var
      array1, array2: array of integer;
    
    procedure SubQIntArray(var ar1, ar2; length: integer);
    asm
    //prepare length if not rounded to 4
      push     ecx
      shr      length, 2
      jz       @LengthToSmall
    @Loop:
      movdqu   xmm1, [ar1]          //or movdqa but ensure 16b aligment first
      movdqu   xmm2, [ar2]          //or movdqa but ensure 16b aligment first
      psubd    xmm1, xmm2
      movdqu   [ar1], xmm1          //or movdqa but ensure 16b aligment first
      add      ar1, 16
      add      ar2, 16
      dec      length
      jnz      @Loop
    @LengthToSmall:
      pop      ecx
      push     ebx
      and      ecx, 3
      jz       @Exit
      mov      ebx, [ar2]
      sub      [ar1], ebx
      dec      ecx
      jz       @Exit
      mov      ebx, [ar2 + 4]
      sub      [ar1 + 4], ebx
      dec      ecx
      jz       @Exit
      mov      ebx, [ar2 + 8]
      sub      [ar1 + 8], ebx
    @Exit:
      pop      ebx
    end;
    
    begin
    //Fill arrays first!
      SubQIntArray(Array1[0], Array2[0], length(Array1));
    

提交回复
热议问题