I\'m interested in how efficient low-level algorithms can be in .net. I would like to enable us to choose to write more of our code in C# rather than C++ in the future, but
The 64-bit jitter does a good job of eliminating bounds checks (at least in straightforward scenarios). I added return sum; at the end of your method and then compiled the program using Visual Studio 2010 in Release mode. In the disassembly below (which I annotated with a C# translation), notice that:
X, even though your code compares i against length instead of X.Length. This is an improvement over the behavior described in the article.Y.Length >= X.Length.Disassembly
; Register assignments:
; rcx := i
; rdx := X
; r8 := Y
; r9 := X.Length ("length" in your code, "XLength" below)
; r10 := Y.Length ("YLength" below)
; r11 := X.Length - 1 ("XLengthMinus1" below)
; xmm1 := sum
; (Prologue)
00000000 push rbx
00000001 push rdi
00000002 sub rsp,28h
; (Store arguments X and Y in rdx and r8)
00000006 mov r8,rdx ; Y
00000009 mov rdx,rcx ; X
; int XLength = X.Length;
0000000c mov r9,qword ptr [rdx+8]
; int XLengthMinus1 = XLength - 1;
00000010 movsxd rax,r9d
00000013 lea r11,[rax-1]
; int YLength = Y.Length;
00000017 mov r10,qword ptr [r8+8]
; if (XLength != YLength)
; throw new ArgumentException("X and Y must be same size");
0000001b cmp r9d,r10d
0000001e jne 0000000000000060
; double sum = 0;
00000020 xorpd xmm1,xmm1
; if (XLength > 0)
; {
00000024 test r9d,r9d
00000027 jle 0000000000000054
; int i = 0;
00000029 xor ecx,ecx
0000002b xor eax,eax
; if (XLengthMinus1 >= YLength)
; throw new IndexOutOfRangeException();
0000002d cmp r11,r10
00000030 jae 0000000000000096
; do
; {
; sum += X[i] * Y[i];
00000032 movsd xmm0,mmword ptr [rdx+rax+10h]
00000038 mulsd xmm0,mmword ptr [r8+rax+10h]
0000003f addsd xmm0,xmm1
00000043 movapd xmm1,xmm0
; i++;
00000047 inc ecx
00000049 add rax,8
; }
; while (i < XLength);
0000004f cmp ecx,r9d
00000052 jl 0000000000000032
; }
; return sum;
00000054 movapd xmm0,xmm1
; (Epilogue)
00000058 add rsp,28h
0000005c pop rdi
0000005d pop rbx
0000005e ret
00000060 ...
00000096 ...
The 32-bit jitter, unfortunately, is not quite as smart. In the disassembly below, notice that:
X, even though your code compares i against length instead of X.Length. Again, this is an improvement over the behavior described in the article.Y.Disassembly
; Register assignments:
; eax := i
; ecx := X
; edx := Y
; esi := X.Length ("length" in your code, "XLength" below)
; (Prologue)
00000000 push ebp
00000001 mov ebp,esp
00000003 push esi
; double sum = 0;
00000004 fldz
; int XLength = X.Length;
00000006 mov esi,dword ptr [ecx+4]
; if (XLength != Y.Length)
; throw new ArgumentException("X and Y must be same size");
00000009 cmp dword ptr [edx+4],esi
0000000c je 00000012
0000000e fstp st(0)
00000010 jmp 0000002F
; int i = 0;
00000012 xor eax,eax
; if (XLength > 0)
; {
00000014 test esi,esi
00000016 jle 0000002C
; do
; {
; double temp = X[i];
00000018 fld qword ptr [ecx+eax*8+8]
; if (i >= Y.Length)
; throw new IndexOutOfRangeException();
0000001c cmp eax,dword ptr [edx+4]
0000001f jae 0000005A
; sum += temp * Y[i];
00000021 fmul qword ptr [edx+eax*8+8]
00000025 faddp st(1),st
; i++;
00000027 inc eax
; while (i < XLength);
00000028 cmp eax,esi
0000002a jl 00000018
; }
; return sum;
0000002c pop esi
0000002d pop ebp
0000002e ret
0000002f ...
0000005a ...
The jitter has improved since 2009, and the 64-bit jitter can generate more efficient code than the 32-bit jitter.
If necessary, though, you can always bypass array bounds checks completely by using unsafe code and pointers (as svick points out). This technique is used by some performance-critical code in the Base Class Library.