I\'ve been doing some performance testing, mainly so I can understand the difference between iterators and simple for loops. As part of this I created a simple set of tests
Oh, that's easy. I assume that you are using x86 technology. What do you need for doing the loops in assembler ?
So you need three variables. Variable access is fastest if you can store them in registers; if you need to move them in and out to memory, you are losing speed. For 64bit longs you need two registers on 32bit and we have only four registers, so chances are high that all variables cannot be stored in registers, but must be stored in intermediate storage like the stack. This alone will slow down access considerably.
Addition of numbers: Addition must be two times; the first time without carry bit and the second time with carry bit. 64bit can it do in one cycle.
Moving/Loading: For every 1-cycle 64bit var you need two cycles for 32bit to load/unload a long integer into memory.
Every component datatype (datatypes which consists of more bits than register/address bits) will lose considerable speed. The speed gains of an order of magnitude is the reason GPUs still prefer floats (32bit) instead of doubles (64bit).