In my computer this code takes 17 seconds (1000 millions times):
static void Main(string[] args) {
var sw = new Stopwatch(); sw.Start();
int r;
for
This is really just a comment, but I don't get enough room.
Here is some C# using Math.DivRem():
[Fact]
public void MathTest()
{
for (var i = 1; i <= 10; i++)
{
int remainder;
var result = Math.DivRem(10, i, out remainder);
// Use the values so they aren't optimized away
Assert.True(result >= 0);
Assert.True(remainder >= 0);
}
}
Here is the corresponding IL:
.method public hidebysig instance void MathTest() cil managed
{
.custom instance void [xunit]Xunit.FactAttribute::.ctor()
.maxstack 3
.locals init (
[0] int32 i,
[1] int32 remainder,
[2] int32 result)
L_0000: ldc.i4.1
L_0001: stloc.0
L_0002: br.s L_002b
L_0004: ldc.i4.s 10
L_0006: ldloc.0
L_0007: ldloca.s remainder
L_0009: call int32 [mscorlib]System.Math::DivRem(int32, int32, int32&)
L_000e: stloc.2
L_000f: ldloc.2
L_0010: ldc.i4.0
L_0011: clt
L_0013: ldc.i4.0
L_0014: ceq
L_0016: call void [xunit]Xunit.Assert::True(bool)
L_001b: ldloc.1
L_001c: ldc.i4.0
L_001d: clt
L_001f: ldc.i4.0
L_0020: ceq
L_0022: call void [xunit]Xunit.Assert::True(bool)
L_0027: ldloc.0
L_0028: ldc.i4.1
L_0029: add
L_002a: stloc.0
L_002b: ldloc.0
L_002c: ldc.i4.s 10
L_002e: ble.s L_0004
L_0030: ret
}
Here is the (relevant) optimized x86 assembly generated:
for (var i = 1; i <= 10; i++)
00000000 push ebp
00000001 mov ebp,esp
00000003 push esi
00000004 push eax
00000005 xor eax,eax
00000007 mov dword ptr [ebp-8],eax
0000000a mov esi,1
{
int remainder;
var result = Math.DivRem(10, i, out remainder);
0000000f mov eax,0Ah
00000014 cdq
00000015 idiv eax,esi
00000017 mov dword ptr [ebp-8],edx
0000001a mov eax,0Ah
0000001f cdq
00000020 idiv eax,esi
Note the 2 calls to idiv. The first stores the remainder (EDX) into the remainder parameter on the stack. The 2nd is to determine the quotient (EAX). This 2nd call is not really needed, since EAX has the correct value after the first call to idiv.