Why is Math.DivRem so inefficient?

后端 未结 11 1941
慢半拍i
慢半拍i 2020-12-08 18:59

In my computer this code takes 17 seconds (1000 millions times):

static void Main(string[] args) {
   var sw = new Stopwatch(); sw.Start();
   int r;
   for          


        
相关标签:
11条回答
  • 2020-12-08 19:31

    This is really just a comment, but I don't get enough room.

    Here is some C# using Math.DivRem():

        [Fact]
        public void MathTest()
        {
            for (var i = 1; i <= 10; i++)
            {
                int remainder;
                var result = Math.DivRem(10, i, out remainder);
                // Use the values so they aren't optimized away
                Assert.True(result >= 0);
                Assert.True(remainder >= 0);
            }
        }
    

    Here is the corresponding IL:

    .method public hidebysig instance void MathTest() cil managed
    {
        .custom instance void [xunit]Xunit.FactAttribute::.ctor()
        .maxstack 3
        .locals init (
            [0] int32 i,
            [1] int32 remainder,
            [2] int32 result)
        L_0000: ldc.i4.1 
        L_0001: stloc.0 
        L_0002: br.s L_002b
        L_0004: ldc.i4.s 10
        L_0006: ldloc.0 
        L_0007: ldloca.s remainder
        L_0009: call int32 [mscorlib]System.Math::DivRem(int32, int32, int32&)
        L_000e: stloc.2 
        L_000f: ldloc.2 
        L_0010: ldc.i4.0 
        L_0011: clt 
        L_0013: ldc.i4.0 
        L_0014: ceq 
        L_0016: call void [xunit]Xunit.Assert::True(bool)
        L_001b: ldloc.1 
        L_001c: ldc.i4.0 
        L_001d: clt 
        L_001f: ldc.i4.0 
        L_0020: ceq 
        L_0022: call void [xunit]Xunit.Assert::True(bool)
        L_0027: ldloc.0 
        L_0028: ldc.i4.1 
        L_0029: add 
        L_002a: stloc.0 
        L_002b: ldloc.0 
        L_002c: ldc.i4.s 10
        L_002e: ble.s L_0004
        L_0030: ret 
    }
    

    Here is the (relevant) optimized x86 assembly generated:

           for (var i = 1; i <= 10; i++)
    00000000  push        ebp 
    00000001  mov         ebp,esp 
    00000003  push        esi 
    00000004  push        eax 
    00000005  xor         eax,eax 
    00000007  mov         dword ptr [ebp-8],eax 
    0000000a  mov         esi,1 
            {
                int remainder;
                var result = Math.DivRem(10, i, out remainder);
    0000000f  mov         eax,0Ah 
    00000014  cdq 
    00000015  idiv        eax,esi 
    00000017  mov         dword ptr [ebp-8],edx 
    0000001a  mov         eax,0Ah 
    0000001f  cdq 
    00000020  idiv        eax,esi 
    

    Note the 2 calls to idiv. The first stores the remainder (EDX) into the remainder parameter on the stack. The 2nd is to determine the quotient (EAX). This 2nd call is not really needed, since EAX has the correct value after the first call to idiv.

    0 讨论(0)
  • 2020-12-08 19:34

    Grrr. The only reason for this function to exist is to take advantage of the CPU instruction for this, and they didn't even do it!

    0 讨论(0)
  • 2020-12-08 19:34

    The answer is probably that nobody has thought this a priority - it's good enough. The fact that this has not been fixed with any new version of the .NET Framework is an indicator of how rarely this is used - most likely, nobody has ever complained.

    0 讨论(0)
  • 2020-12-08 19:34

    If I had to take a wild guess, I'd say that whoever implemented Math.DivRem had no idea that x86 processors are capable of doing it in a single instruction, so they wrote it as two operations. That's not necessarily a bad thing if the optimizer works correctly, though it is yet another indicator that low-level knowledge is sadly lacking in most programmers nowadays. I would expect the optimizer to collapse modulus and then divide operations into one instruction, and the people who write optimizers should know these sorts of low-level things...

    0 讨论(0)
  • 2020-12-08 19:41

    It's partly in the nature of the beast. There is to the best of my knowledge no general quick way to calculate the remainder of a division. This is going to take a correspondingly large amount of clock cycles, even with x hundred million transistors.

    0 讨论(0)
  • 2020-12-08 19:49

    While .NET Framework 4.6.2 still uses the suboptimal modulo and divide, .NET Core (CoreCLR) currently replaces the divide with a subtract:

        public static int DivRem(int a, int b, out int result)
        {
            // TODO https://github.com/dotnet/runtime/issues/5213:
            // Restore to using % and / when the JIT is able to eliminate one of the idivs.
            // In the meantime, a * and - is measurably faster than an extra /.
    
            int div = a / b;
            result = a - (div * b);
            return div;
        }
    

    And there's an open issue to either improve DivRem specifically (via intrinsic), or detect and optimise the general case in RyuJIT.

    0 讨论(0)
提交回复
热议问题