Capture SIGFPE from SIMD instruction

I'm trying to clear the floating point divide by zero flag to ignore that exception. I'm expecting that with the flag set (no change from default behavior I believe, and commented out below), my error handler will fire. However, _mm_div_ss doesn't seem to be raising SIGFPE. Any ideas?

#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <xmmintrin.h>

static void sigaction_sfpe(int signal, siginfo_t *si, void *arg)
{
    printf("inside SIGFPE handler\nexit now.");
    exit(1);
}

int main()
{
    struct sigaction sa;

    memset(&sa, 0, sizeof(sa));
    sigemptyset(&sa.sa_mask);
    sa.sa_sigaction = sigaction_sfpe;
    sa.sa_flags = SA_SIGINFO;
    sigaction(SIGFPE, &sa, NULL);

    //_mm_setcsr(0x00001D80); // catch all FPE except divide by zero

    __m128 s1, s2;
    s1 = _mm_set_ps(1.0, 1.0, 1.0, 1.0);
    s2 = _mm_set_ps(0.0, 0.0, 0.0, 0.0);
    _mm_div_ss(s1, s2);

    printf("done (no error).\n");

    return 0;
}

Output from above code:

$ gcc a.c
$ ./a.out 
done (no error).

As you can see, my handler is never reached. Side note: I've tried a couple various compiler flags (-msse3, -march=native) with no change.

gcc (Debian 5.3.1-7) 5.3.1 20160121

Some info from /proc/cpuinfo

model name      : Intel(R) Core(TM) i3 CPU       M 380  @ 2.53GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm arat dtherm tpr_shadow vnmi flexpriority ept vpid

Two things.

First, I misunderstood the documentation. Exceptions need to be unmasked to be caught. Calling _mm_setcsr(0x00001D80); will allow SIGFPE to fire on divide by zero.

Second, gcc was optimizing out my divide instruction even with -O0.

Given source line

_mm_div_ss(s1, s2);

Compiling with gcc -S -O0 -msse2 a.c gives

76     movaps  -24(%ebp), %xmm0
77     movaps  %xmm0, -72(%ebp)
78     movaps  -40(%ebp), %xmm0
79     movaps  %xmm0, -88(%ebp)

a1     subl    $12, %esp        ; renumbered to show insertion below
a2     pushl   $.LC2
a3     call    puts
a4     addl    $16, %esp

While source line

s2 = _mm_div_ss(s1, s2); // add "s2 = "

gives

76     movaps  -24(%ebp), %xmm0
77     movaps  %xmm0, -72(%ebp)
78     movaps  -40(%ebp), %xmm0
79     movaps  %xmm0, -88(%ebp)
       movaps  -72(%ebp), %xmm0
       divss   -88(%ebp), %xmm0
       movaps  %xmm0, -40(%ebp)
a1     subl    $12, %esp
a2     pushl   $.LC2
a3     call    puts
a4     addl    $16, %esp

With those changes, the SIGFPE handler is called according to the divide-by-zero flag in MXCSR.

来源：https://stackoverflow.com/questions/39690198/capture-sigfpe-from-simd-instruction

标签

signals

simd