SSE “denormals are zeros” option

Deadly 提交于 2019-12-10 15:23:01

问题


I just experimented with the SSE option "denormals are zeros" through setting this option with _mm_setcsr( _mm_getcsr() | 0x40 ).

I found an in interesting thing: this doesn't prevent SSE from generating denormals when both operands are non-denormal! It just makes SSE consider denormal operands as if they were zeros.

As I explained I know what this option does. But what is this option good for?


Addendum

I just read the Intel article linked by user nucleon. And I was curious about the performance impact of denormals on SSE computations.

So I wrote a little Windows program to test this:

#include <windows.h>
#include <intrin.h>
#include <iostream>

using namespace std;

union DBL
{
    DWORDLONG dwlValue;
    double    value;
};

int main()
{
    DWORDLONG dwlTicks;
    DBL       d;
    double    sum;

    dwlTicks = __rdtsc();

    for( d.dwlValue = 0, sum = 0.0; d.dwlValue < 100000000; d.dwlValue++ )
        sum += d.value;

    dwlTicks = __rdtsc() - dwlTicks;
    cout << sum << endl;
    cout << dwlTicks / 100000000.0 << endl;

    dwlTicks = __rdtsc();

    for( d.dwlValue = 0x0010000000000000u, sum = 0.0;
         d.dwlValue < (0x0010000000000000u + 100000000); d.dwlValue++ )
        sum += d.value;

    dwlTicks = __rdtsc() - dwlTicks;
    cout << sum << endl;
    cout << dwlTicks / 100000000.0 << endl;

    return 0;
}

(I printed the sums only to prevent the compiler from optimizing away the summation.)

The result is that on my Xeon E3-1240 (Skylake), each iteration takes four clock-cycles when "d" is non-denormal. When "d" is a denormal, each iteration takes about 150 clock cycles! I'd never believe denormals would have such a huge performance impact if I hadn't seen the opposite.

来源:https://stackoverflow.com/questions/37886551/sse-denormals-are-zeros-option

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!