Disassembly view of C# 64-bit Release code is 75% longer than 32-bit Debug code?

匆匆过客 提交于 2019-11-29 14:22:53

I suspect you are using "Go to disassembly" while debugging the release build to get the assembly code.

After going to Tools -> Options, Debugging, General, and disabling "Suppress JIT optimization on module load" I got an x64 assembly listing without error checking.

It seems like by default even in release mode the code is not optimized if the debugger attached. Keep that in mind when trying to benchmark your code.

PS: Benchmarking shows x64 slightly faster than x86, 4.3 vs 4.8 seconds for 1 billion function calls.

Edit: Break points still worked for me, otherwise I wouldn't have been able to see the disassembly after unchecking. Your example line from above looks like this (VS 2012 RC):

crc = (crc >> 8) ^ crcTable[((val & 0x0000ff00) >> 8) ^ crc & 0xff];
00000030  mov         r11d,eax 
00000033  shr         r11d,8 
00000037  mov         ecx,edx 
00000039  and         ecx,0FF00h 
0000003f  shr         ecx,8 
00000042  movzx       eax,al 
00000045  xor         ecx,eax 
00000047  mov         eax,ecx 
00000049  cmp         rax,r9 
0000004c  jae         00000000000000A4 
0000004e  mov         eax,dword ptr [r8+rax*4+10h] 
00000053  xor         r11d,eax 

Looking at the code this is related to the error checking for accessing crcTable. It's doing your bounds before it starts digging into the array.

In the the 32-bit code you see this

0000008e  mov         ecx,dword ptr ds:[03387F38h] 
....
0000009e  xor         eax,dword ptr [ecx+edx*4+8] 

In this case it's loading the base address of the array from 03387F38h and then using standard pointer arithmetic to access the correct entry.

In the 64-bit code this seems to be more complicated.

000000b0  mov         rdx,124DEE68h 
000000ba  mov         rdx,qword ptr [rdx]

This loads an address into the rdx register

000000da  mov         qword ptr [rsp+40h],rdx 
...
00000105  mov         rax,qword ptr [rsp+40h] 
0000010a  mov         rcx,qword ptr [rsp+48h] 
0000010f  mov         ecx,dword ptr [rax+rcx*4+10h] 

This moves the address onto the stack, then later on it moves it into the rax register and does the same pointer work to access the array.

Pretty much everything between 000000da and 00000100/00000105 seems to be validation code. The rest of the code maps pretty well between the 64-bit and the 32-bit code, with some less aggressive register utilization in the 64-bit code.

exp ^ crc & 0xff is compiled as exp ^ (cr & 0xff):

00000082  mov         ecx,dword ptr [ebp-40h]  
00000085  mov         ebx,0FFh  
0000008a  and         ecx,ebx  
0000008c  xor         edx,ecx  

Should you write the expression as ?

(exp ^ crc) & 0xff

The 64-bit version is definitely less optimized than the 32-bit version. CLR has two seperate JIT compiler implementation.

Also, if perf is criticial, use unsafe code to remove bounds check.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!