Logarithm in C++ and assembly

后端 未结 1 1186
Happy的楠姐
Happy的楠姐 2020-12-21 22:20

Apparently MSVC++2017 toolset v141 (x64 Release configuration) doesn\'t use FYL2X x86_64 assembly instruction via a C/C++ intrinsic, but rather C++ log()<

1条回答
  •  感情败类
    2020-12-21 23:12

    Here is the assembly code using FYL2X:

    _DATA SEGMENT
    
    _DATA ENDS
    
    _TEXT SEGMENT
    
    PUBLIC SRLog2MulD
    
    ; XMM0L=toLog
    ; XMM1L=toMul
    SRLog2MulD PROC
      movq qword ptr [rsp+16], xmm1
      movq qword ptr [rsp+8], xmm0
      fld qword ptr [rsp+16]
      fld qword ptr [rsp+8]
      fyl2x
      fstp qword ptr [rsp+8]
      movq xmm0, qword ptr [rsp+8]
      ret
    
    SRLog2MulD ENDP
    
    _TEXT ENDS
    
    END
    

    The calling convention is according to https://docs.microsoft.com/en-us/cpp/build/overview-of-x64-calling-conventions , e.g.

    The x87 register stack is unused. It may be used by the callee, but must be considered volatile across function calls.

    The prototype in C++ is:

    extern "C" double __fastcall SRLog2MulD(const double toLog, const double toMul);
    

    The performance is 2 times slower than std::log2() and more than 3 times slower than std::log():

    Log2: 94803174.389 Ops/sec calculated 2513272986.435
    FPU Log2: 52008300.525 Ops/sec calculated 2513272986.435
    Ln: 169392473.892 Ops/sec calculated 1742068084.525
    

    The benchmarking code is as follows:

    void BenchmarkFpuLog2() {
      double sum = 0;
      auto start = std::chrono::high_resolution_clock::now();
      for (int64_t i = 1; i <= cnLogs; i++) {
        sum += SRPlat::SRLog2MulD(double(i), 1);
      }
      auto elapsed = std::chrono::high_resolution_clock::now() - start;
      double nSec = 1e-6 * std::chrono::duration_cast(elapsed).count();
      printf("FPU Log2: %.3lf Ops/sec calculated %.3lf\n", cnLogs / nSec, sum);
    }
    

    0 讨论(0)
提交回复
热议问题