Logarithm in C++ and assembly

后端未结
关注
 1  1186
Happy的楠姐 2020-12-21 22:20
Apparently MSVC++2017 toolset v141 (x64 Release configuration) doesn\'t use FYL2X x86_64 assembly instruction via a C/C++ intrinsic, but rather C++ log()<

      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   感情败类
                                             
                
                
                (楼主)
            
              
              
                2020-12-21 23:12
              

            
            
                        
Here is the assembly code using FYL2X:

_DATA SEGMENT

_DATA ENDS

_TEXT SEGMENT

PUBLIC SRLog2MulD

; XMM0L=toLog
; XMM1L=toMul
SRLog2MulD PROC
  movq qword ptr [rsp+16], xmm1
  movq qword ptr [rsp+8], xmm0
  fld qword ptr [rsp+16]
  fld qword ptr [rsp+8]
  fyl2x
  fstp qword ptr [rsp+8]
  movq xmm0, qword ptr [rsp+8]
  ret

SRLog2MulD ENDP

_TEXT ENDS

END


The calling convention is according to https://docs.microsoft.com/en-us/cpp/build/overview-of-x64-calling-conventions , e.g.


  The x87 register stack is unused. It may be used by the callee, but
  must be considered volatile across function calls.


The prototype in C++ is:

extern "C" double __fastcall SRLog2MulD(const double toLog, const double toMul);


The performance is 2 times slower than std::log2() and more than 3 times slower than std::log():

Log2: 94803174.389 Ops/sec calculated 2513272986.435
FPU Log2: 52008300.525 Ops/sec calculated 2513272986.435
Ln: 169392473.892 Ops/sec calculated 1742068084.525


The benchmarking code is as follows:

void BenchmarkFpuLog2() {
  double sum = 0;
  auto start = std::chrono::high_resolution_clock::now();
  for (int64_t i = 1; i <= cnLogs; i++) {
    sum += SRPlat::SRLog2MulD(double(i), 1);
  }
  auto elapsed = std::chrono::high_resolution_clock::now() - start;
  double nSec = 1e-6 * std::chrono::duration_cast(elapsed).count();
  printf("FPU Log2: %.3lf Ops/sec calculated %.3lf\n", cnLogs / nSec, sum);
}

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复