Is it more efficient to perform a range check by casting to uint instead of checking for negative values?

前端未结

关注

 7  668

I stumbled upon this piece of code in .NET\'s List source code:

// Following trick can reduce the range check by one
if ((uint) index >= (uint)_size) {


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  遥遥无期        
                
              
                            
                2020-12-13 23:54
              
            
            
                                                                       
While exploring this on an intel processor I found no differene in execution times, possibly due to multiple integer execution units.

But when doing this on 16MHZ real-time microprocessor with neither branch prediction nor integer execution units there were notable differences.

1 million iterations of the slower code took 1761 ms

int slower(char *a, long i)
{
  if (i < 0 || i >= 10)
    return 0;

  return a[i];
}


1 million iterations faster code took 1635 ms

int faster(char *a, long i)
{
  if ((unsigned int)i >= 10)
    return 0;
  return a[i];
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  广开言路        
                
              
                            
                2020-12-13 23:58
              
            
            
                                                                       
Yes, this is more efficient. The JIT does the same trick when range-checking array accesses.

The transformation and reasoning is as follows:

i >= 0 && i < array.Length becomes (uint)i < (uint)array.Length because array.Length <= int.MaxValue so that array.Length has the same value as (uint)array.Length. If i happens to be negative then (uint)i > int.MaxValue and the check fails.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  难免孤独        
                
              
                            
                2020-12-14 00:03
              
            
            
                                                                       
Note that this trick won't work if your project is checked instead of unchecked. Best case it will be slower (because each cast will need to be checked against overflow) (or at least not-faster), worst case you will get an OverflowException if you try to pass -1 as the index (instead of your exception). 

If you want to write it "correctly" and in a more "will surely work" way, you should put a 

unchecked
{
    // test
}


all around the test.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2020-12-14 00:07
              
            
            
                                                                       
Assuming _sizeis an integer, private to the list and index is the argument to this function which's validity needs to be tested.

Assuming further that _size is always >= 0.

Then the original test would have been:

if(index < 0 || index > size) throw exception


The optimized version

if((uint)index > (uint)_size) throw exception


has one comparison (as opsed to two int he former example.) Because the cast just reinterpretes the bits and makes the >in fact an unsigned comparison, no additonal CPU cycles are used for it.

Why does it work?

The results are simple/trivial as long as index >= 0.

If index < 0 the (uint)index will turn it into a very large number:

Example: 0xFFFF is -1 as int, but 65535 as uint, thus 

(uint)-1 > (uint)x 


is always true if x was positive.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  Happy的楠姐        
                
              
                            
                2020-12-14 00:07
              
            
            
                                                                       
Apparently in real life it isn't faster. Check this: https://dotnetfiddle.net/lZKHmn

Turns out, that thanks to Intel's branch prediction and parallel execution the more obvious and readable code actually works faster...

Here's the code:



using System;
using System.Diagnostics;

public class Program
{


    const int MAX_ITERATIONS = 10000000;
    const int MAX_SIZE = 1000;


    public static void Main()
    {

            var timer = new Stopwatch();


            Random rand = new Random();
            long InRange = 0;
            long OutOfRange = 0;

            timer.Start();
            for ( int i = 0; i < MAX_ITERATIONS; i++ ) {
                var x = rand.Next( MAX_SIZE * 2 ) - MAX_SIZE;
                if ( x < 0 || x > MAX_SIZE ) {
                    OutOfRange++;
                } else {
                    InRange++;
                }
            }
            timer.Stop();

            Console.WriteLine( "Comparision 1: " + InRange + "/" + OutOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms" );


            rand = new Random();
            InRange = 0;
            OutOfRange = 0;

            timer.Reset();
            timer.Start();
            for ( int i = 0; i < MAX_ITERATIONS; i++ ) {
                var x = rand.Next( MAX_SIZE * 2 ) - MAX_SIZE;
                if ( (uint) x > (uint) MAX_SIZE ) {
                    OutOfRange++;
                } else {
                    InRange++;
                }
            }
            timer.Stop();

            Console.WriteLine( "Comparision 2: " + InRange + "/" + OutOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms" );

    }
}



                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2020-12-14 00:11
              
            
            
                                                                       
Let's say we have:

public void TestIndex1(int index)
{
  if(index < 0 || index >= _size)
    ThrowHelper.ThrowArgumentOutOfRangeException();
}
public void TestIndex2(int index)
{
  if((uint)index >= (uint)_size)
    ThrowHelper.ThrowArgumentOutOfRangeException();
}


Let's compile these and look at ILSpy:

.method public hidebysig 
    instance void TestIndex1 (
        int32 index
    ) cil managed 
{
    IL_0000: ldarg.1
    IL_0001: ldc.i4.0
    IL_0002: blt.s IL_000d
    IL_0004: ldarg.1
    IL_0005: ldarg.0
    IL_0006: ldfld int32 TempTest.TestClass::_size
    IL_000b: bge.s IL_0012
    IL_000d: call void TempTest.ThrowHelper::ThrowArgumentOutOfRangeException()
    IL_0012: ret
}

.method public hidebysig 
    instance void TestIndex2 (
        int32 index
    ) cil managed 
{
    IL_0000: ldarg.1
    IL_0001: ldarg.0
    IL_0002: ldfld int32 TempTest.TestClass::_size
    IL_0007: blt.un.s IL_000e
    IL_0009: call void TempTest.ThrowHelper::ThrowArgumentOutOfRangeException()
    IL_000e: ret
}


It's easy to see that the second has less code, with one less branch.

Really, there's no cast at all, there's the choice of whether to use blt.s and bge.s or to use blt.s.un, where the latter treats the integers passed as unsigned while the former treats them as signed.

(Note for those not familiar with CIL, since this is a C# question with a CIL answer, bge.s, blt.s and blt.s.un are the "short" versions of bge, blt and blt.un respectively. blt pops two values off the stack and branches if the first is less than the second when considering them as signed values while blt.un pops two values of the stack and branches if the first is less than the second when considering them as unsigned values).

It's utterly a micro-opt, but there are times when micro-opt's are worth doing. Consider further, that with the rest of the code in the method body this could mean the difference between something falling inside the jitters limits for inlining or not, and if they're bothering to have a helper for throwing out of range exceptions they're probably attempting to ensure inlining happens if at all possible, and the extra 4 bytes could make all the difference.

Indeed, it's quite likely for that inlining difference to be a much bigger deal than the reduction of one branch. There aren't a lot of times when going out of your way to ensure inlining happens is worth it, but a core method of a class of such heavy use as List<T> would certainly be one of them.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复