Array bounds check efficiency in .net 4 and above

后端未结

关注

 4  989

离开以前 2020-12-05 06:19

I\'m interested in how efficient low-level algorithms can be in .net. I would like to enable us to choose to write more of our code in C# rather than C++ in the future, but

4条回答

一个人的身影 (楼主)

2020-12-05 06:54

64-bit

The 64-bit jitter does a good job of eliminating bounds checks (at least in straightforward scenarios). I added return sum; at the end of your method and then compiled the program using Visual Studio 2010 in Release mode. In the disassembly below (which I annotated with a C# translation), notice that:

There are no bounds checks for X, even though your code compares i against length instead of X.Length. This is an improvement over the behavior described in the article.
Before the main loop, there is a single check to make sure that Y.Length >= X.Length.
The main loop (offsets 00000032 through 00000052) does not contain any bounds checks.

Disassembly

; Register assignments:
;    rcx  := i
;    rdx  := X
;    r8   := Y
;    r9   := X.Length ("length" in your code, "XLength" below)
;    r10  := Y.Length ("YLength" below)
;    r11  := X.Length - 1 ("XLengthMinus1" below)
;    xmm1 := sum

; (Prologue)
00000000  push        rbx
00000001  push        rdi
00000002  sub         rsp,28h

; (Store arguments X and Y in rdx and r8)
00000006  mov         r8,rdx   ; Y
00000009  mov         rdx,rcx  ; X

; int XLength = X.Length;
0000000c  mov         r9,qword ptr [rdx+8]

; int XLengthMinus1 = XLength - 1;
00000010  movsxd      rax,r9d
00000013  lea         r11,[rax-1]

; int YLength = Y.Length;
00000017  mov         r10,qword ptr [r8+8]

; if (XLength != YLength)
;     throw new ArgumentException("X and Y must be same size");
0000001b  cmp         r9d,r10d
0000001e  jne         0000000000000060

; double sum = 0;
00000020  xorpd       xmm1,xmm1

; if (XLength > 0)
; {
00000024  test        r9d,r9d
00000027  jle         0000000000000054

;     int i = 0;
00000029  xor         ecx,ecx
0000002b  xor         eax,eax

;     if (XLengthMinus1 >= YLength)
;         throw new IndexOutOfRangeException();
0000002d  cmp         r11,r10
00000030  jae         0000000000000096

;     do
;     {
;         sum += X[i] * Y[i];
00000032  movsd       xmm0,mmword ptr [rdx+rax+10h]
00000038  mulsd       xmm0,mmword ptr [r8+rax+10h]
0000003f  addsd       xmm0,xmm1
00000043  movapd      xmm1,xmm0

;         i++;
00000047  inc         ecx
00000049  add         rax,8

;     }
;     while (i < XLength);
0000004f  cmp         ecx,r9d
00000052  jl          0000000000000032
; }

; return sum;
00000054  movapd      xmm0,xmm1

; (Epilogue)
00000058  add         rsp,28h
0000005c  pop         rdi
0000005d  pop         rbx
0000005e  ret

00000060  ...

00000096  ...

32-bit

The 32-bit jitter, unfortunately, is not quite as smart. In the disassembly below, notice that:

There are no bounds checks for X, even though your code compares i against length instead of X.Length. Again, this is an improvement over the behavior described in the article.
The main loop (offsets 00000018 through 0000002a) contains a bounds check for Y.

Disassembly

; Register assignments:
;    eax  := i
;    ecx  := X
;    edx  := Y
;    esi  := X.Length ("length" in your code, "XLength" below)

; (Prologue)
00000000  push        ebp
00000001  mov         ebp,esp
00000003  push        esi

; double sum = 0;
00000004  fldz

; int XLength = X.Length;
00000006  mov         esi,dword ptr [ecx+4]

; if (XLength != Y.Length)
;     throw new ArgumentException("X and Y must be same size");
00000009  cmp         dword ptr [edx+4],esi
0000000c  je          00000012
0000000e  fstp        st(0)
00000010  jmp         0000002F

; int i = 0;
00000012  xor         eax,eax

; if (XLength > 0)
; {
00000014  test        esi,esi
00000016  jle         0000002C

;     do
;     {
;         double temp = X[i];
00000018  fld         qword ptr [ecx+eax*8+8]

;         if (i >= Y.Length)
;             throw new IndexOutOfRangeException();
0000001c  cmp         eax,dword ptr [edx+4]
0000001f  jae         0000005A

;         sum += temp * Y[i];
00000021  fmul        qword ptr [edx+eax*8+8]
00000025  faddp       st(1),st

;         i++;
00000027  inc         eax

;     while (i < XLength);
00000028  cmp         eax,esi
0000002a  jl          00000018
; }

; return sum;
0000002c  pop         esi
0000002d  pop         ebp
0000002e  ret

0000002f  ...

0000005a  ...

Summing Up

The jitter has improved since 2009, and the 64-bit jitter can generate more efficient code than the 32-bit jitter.

If necessary, though, you can always bypass array bounds checks completely by using unsafe code and pointers (as svick points out). This technique is used by some performance-critical code in the Base Class Library.

0 讨论(0)

查看其它4个回答