Array bounds check efficiency in .net 4 and above

后端 未结 4 989
离开以前
离开以前 2020-12-05 06:19

I\'m interested in how efficient low-level algorithms can be in .net. I would like to enable us to choose to write more of our code in C# rather than C++ in the future, but

4条回答
  •  一个人的身影
    2020-12-05 06:54

    64-bit

    The 64-bit jitter does a good job of eliminating bounds checks (at least in straightforward scenarios). I added return sum; at the end of your method and then compiled the program using Visual Studio 2010 in Release mode. In the disassembly below (which I annotated with a C# translation), notice that:

    • There are no bounds checks for X, even though your code compares i against length instead of X.Length. This is an improvement over the behavior described in the article.
    • Before the main loop, there is a single check to make sure that Y.Length >= X.Length.
    • The main loop (offsets 00000032 through 00000052) does not contain any bounds checks.

    Disassembly

    ; Register assignments:
    ;    rcx  := i
    ;    rdx  := X
    ;    r8   := Y
    ;    r9   := X.Length ("length" in your code, "XLength" below)
    ;    r10  := Y.Length ("YLength" below)
    ;    r11  := X.Length - 1 ("XLengthMinus1" below)
    ;    xmm1 := sum
    
    ; (Prologue)
    00000000  push        rbx
    00000001  push        rdi
    00000002  sub         rsp,28h
    
    ; (Store arguments X and Y in rdx and r8)
    00000006  mov         r8,rdx   ; Y
    00000009  mov         rdx,rcx  ; X
    
    ; int XLength = X.Length;
    0000000c  mov         r9,qword ptr [rdx+8]
    
    ; int XLengthMinus1 = XLength - 1;
    00000010  movsxd      rax,r9d
    00000013  lea         r11,[rax-1]
    
    ; int YLength = Y.Length;
    00000017  mov         r10,qword ptr [r8+8]
    
    ; if (XLength != YLength)
    ;     throw new ArgumentException("X and Y must be same size");
    0000001b  cmp         r9d,r10d
    0000001e  jne         0000000000000060
    
    ; double sum = 0;
    00000020  xorpd       xmm1,xmm1
    
    ; if (XLength > 0)
    ; {
    00000024  test        r9d,r9d
    00000027  jle         0000000000000054
    
    ;     int i = 0;
    00000029  xor         ecx,ecx
    0000002b  xor         eax,eax
    
    ;     if (XLengthMinus1 >= YLength)
    ;         throw new IndexOutOfRangeException();
    0000002d  cmp         r11,r10
    00000030  jae         0000000000000096
    
    ;     do
    ;     {
    ;         sum += X[i] * Y[i];
    00000032  movsd       xmm0,mmword ptr [rdx+rax+10h]
    00000038  mulsd       xmm0,mmword ptr [r8+rax+10h]
    0000003f  addsd       xmm0,xmm1
    00000043  movapd      xmm1,xmm0
    
    ;         i++;
    00000047  inc         ecx
    00000049  add         rax,8
    
    ;     }
    ;     while (i < XLength);
    0000004f  cmp         ecx,r9d
    00000052  jl          0000000000000032
    ; }
    
    ; return sum;
    00000054  movapd      xmm0,xmm1
    
    ; (Epilogue)
    00000058  add         rsp,28h
    0000005c  pop         rdi
    0000005d  pop         rbx
    0000005e  ret
    
    00000060  ...
    
    00000096  ...
    

    32-bit

    The 32-bit jitter, unfortunately, is not quite as smart. In the disassembly below, notice that:

    • There are no bounds checks for X, even though your code compares i against length instead of X.Length. Again, this is an improvement over the behavior described in the article.
    • The main loop (offsets 00000018 through 0000002a) contains a bounds check for Y.

    Disassembly

    ; Register assignments:
    ;    eax  := i
    ;    ecx  := X
    ;    edx  := Y
    ;    esi  := X.Length ("length" in your code, "XLength" below)
    
    ; (Prologue)
    00000000  push        ebp
    00000001  mov         ebp,esp
    00000003  push        esi
    
    ; double sum = 0;
    00000004  fldz
    
    ; int XLength = X.Length;
    00000006  mov         esi,dword ptr [ecx+4]
    
    ; if (XLength != Y.Length)
    ;     throw new ArgumentException("X and Y must be same size");
    00000009  cmp         dword ptr [edx+4],esi
    0000000c  je          00000012
    0000000e  fstp        st(0)
    00000010  jmp         0000002F
    
    ; int i = 0;
    00000012  xor         eax,eax
    
    ; if (XLength > 0)
    ; {
    00000014  test        esi,esi
    00000016  jle         0000002C
    
    ;     do
    ;     {
    ;         double temp = X[i];
    00000018  fld         qword ptr [ecx+eax*8+8]
    
    ;         if (i >= Y.Length)
    ;             throw new IndexOutOfRangeException();
    0000001c  cmp         eax,dword ptr [edx+4]
    0000001f  jae         0000005A
    
    ;         sum += temp * Y[i];
    00000021  fmul        qword ptr [edx+eax*8+8]
    00000025  faddp       st(1),st
    
    ;         i++;
    00000027  inc         eax
    
    ;     while (i < XLength);
    00000028  cmp         eax,esi
    0000002a  jl          00000018
    ; }
    
    ; return sum;
    0000002c  pop         esi
    0000002d  pop         ebp
    0000002e  ret
    
    0000002f  ...
    
    0000005a  ...
    

    Summing Up

    The jitter has improved since 2009, and the 64-bit jitter can generate more efficient code than the 32-bit jitter.

    If necessary, though, you can always bypass array bounds checks completely by using unsafe code and pointers (as svick points out). This technique is used by some performance-critical code in the Base Class Library.

提交回复
热议问题