Why is Math.DivRem so inefficient?

后端未结

关注

 11  1941

In my computer this code takes 17 seconds (1000 millions times):

static void Main(string[] args) {
   var sw = new Stopwatch(); sw.Start();
   int r;
   for


                      
              相关标签:


      
      
        
          11条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2020-12-08 19:31
              
            
            
                                                                       
This is really just a comment, but I don't get enough room.

Here is some C# using Math.DivRem():

    [Fact]
    public void MathTest()
    {
        for (var i = 1; i <= 10; i++)
        {
            int remainder;
            var result = Math.DivRem(10, i, out remainder);
            // Use the values so they aren't optimized away
            Assert.True(result >= 0);
            Assert.True(remainder >= 0);
        }
    }


Here is the corresponding IL:

.method public hidebysig instance void MathTest() cil managed
{
    .custom instance void [xunit]Xunit.FactAttribute::.ctor()
    .maxstack 3
    .locals init (
        [0] int32 i,
        [1] int32 remainder,
        [2] int32 result)
    L_0000: ldc.i4.1 
    L_0001: stloc.0 
    L_0002: br.s L_002b
    L_0004: ldc.i4.s 10
    L_0006: ldloc.0 
    L_0007: ldloca.s remainder
    L_0009: call int32 [mscorlib]System.Math::DivRem(int32, int32, int32&)
    L_000e: stloc.2 
    L_000f: ldloc.2 
    L_0010: ldc.i4.0 
    L_0011: clt 
    L_0013: ldc.i4.0 
    L_0014: ceq 
    L_0016: call void [xunit]Xunit.Assert::True(bool)
    L_001b: ldloc.1 
    L_001c: ldc.i4.0 
    L_001d: clt 
    L_001f: ldc.i4.0 
    L_0020: ceq 
    L_0022: call void [xunit]Xunit.Assert::True(bool)
    L_0027: ldloc.0 
    L_0028: ldc.i4.1 
    L_0029: add 
    L_002a: stloc.0 
    L_002b: ldloc.0 
    L_002c: ldc.i4.s 10
    L_002e: ble.s L_0004
    L_0030: ret 
}


Here is the (relevant) optimized x86 assembly generated:

       for (var i = 1; i <= 10; i++)
00000000  push        ebp 
00000001  mov         ebp,esp 
00000003  push        esi 
00000004  push        eax 
00000005  xor         eax,eax 
00000007  mov         dword ptr [ebp-8],eax 
0000000a  mov         esi,1 
        {
            int remainder;
            var result = Math.DivRem(10, i, out remainder);
0000000f  mov         eax,0Ah 
00000014  cdq 
00000015  idiv        eax,esi 
00000017  mov         dword ptr [ebp-8],edx 
0000001a  mov         eax,0Ah 
0000001f  cdq 
00000020  idiv        eax,esi 


Note the 2 calls to idiv. The first stores the remainder (EDX) into the remainder parameter on the stack. The 2nd is to determine the quotient (EAX). This 2nd call is not really needed, since EAX has the correct value after the first call to idiv.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  挽巷        
                
              
                            
                2020-12-08 19:34
              
            
            
                                                                       
Grrr. The only reason for this function to exist is to take advantage of the CPU instruction for this, and they didn't even do it!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野趣味        
                
              
                            
                2020-12-08 19:34
              
            
            
                                                                       
The answer is probably that nobody has thought this a priority - it's good enough. The fact that this has not been fixed with any new version of the .NET Framework is an indicator of how rarely this is used - most likely, nobody has ever complained.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-12-08 19:34
              
            
            
                                                                       
If I had to take a wild guess, I'd say that whoever implemented Math.DivRem had no idea that x86 processors are capable of doing it in a single instruction, so they wrote it as two operations.  That's not necessarily a bad thing if the optimizer works correctly, though it is yet another indicator that low-level knowledge is sadly lacking in most programmers nowadays.  I would expect the optimizer to collapse modulus and then divide operations into one instruction, and the people who write optimizers should know these sorts of low-level things...
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2020-12-08 19:41
              
            
            
                                                                       
It's partly in the nature of the beast. There is to the best of my knowledge no general quick way to calculate the remainder of a division. This is going to take a correspondingly large amount of clock cycles, even with x hundred million transistors.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2020-12-08 19:49
              
            
            
                                                                       
While .NET Framework 4.6.2 still uses the suboptimal modulo and divide, .NET Core (CoreCLR) currently replaces the divide with a subtract:
    public static int DivRem(int a, int b, out int result)
    {
        // TODO https://github.com/dotnet/runtime/issues/5213:
        // Restore to using % and / when the JIT is able to eliminate one of the idivs.
        // In the meantime, a * and - is measurably faster than an extra /.

        int div = a / b;
        result = a - (div * b);
        return div;
    }

And there's an open issue to either improve DivRem specifically (via intrinsic), or detect and optimise the general case in RyuJIT.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复