I'm trying to clear the floating point divide by zero flag to ignore that exception. I'm expecting that with the flag set (no change from default behavior I believe, and commented out below), my error handler will fire. However, _mm_div_ss
doesn't seem to be raising SIGFPE. Any ideas?
#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <xmmintrin.h>
static void sigaction_sfpe(int signal, siginfo_t *si, void *arg)
{
printf("inside SIGFPE handler\nexit now.");
exit(1);
}
int main()
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sigemptyset(&sa.sa_mask);
sa.sa_sigaction = sigaction_sfpe;
sa.sa_flags = SA_SIGINFO;
sigaction(SIGFPE, &sa, NULL);
//_mm_setcsr(0x00001D80); // catch all FPE except divide by zero
__m128 s1, s2;
s1 = _mm_set_ps(1.0, 1.0, 1.0, 1.0);
s2 = _mm_set_ps(0.0, 0.0, 0.0, 0.0);
_mm_div_ss(s1, s2);
printf("done (no error).\n");
return 0;
}
Output from above code:
$ gcc a.c
$ ./a.out
done (no error).
As you can see, my handler is never reached. Side note: I've tried a couple various compiler flags (-msse3, -march=native) with no change.
gcc (Debian 5.3.1-7) 5.3.1 20160121
Some info from /proc/cpuinfo
model name : Intel(R) Core(TM) i3 CPU M 380 @ 2.53GHz
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm arat dtherm tpr_shadow vnmi flexpriority ept vpid
Two things.
First, I misunderstood the documentation. Exceptions need to be unmasked to be caught. Calling _mm_setcsr(0x00001D80);
will allow SIGFPE to fire on divide by zero.
Second, gcc was optimizing out my divide instruction even with -O0
.
Given source line
_mm_div_ss(s1, s2);
Compiling with gcc -S -O0 -msse2 a.c
gives
76 movaps -24(%ebp), %xmm0
77 movaps %xmm0, -72(%ebp)
78 movaps -40(%ebp), %xmm0
79 movaps %xmm0, -88(%ebp)
a1 subl $12, %esp ; renumbered to show insertion below
a2 pushl $.LC2
a3 call puts
a4 addl $16, %esp
While source line
s2 = _mm_div_ss(s1, s2); // add "s2 = "
gives
76 movaps -24(%ebp), %xmm0
77 movaps %xmm0, -72(%ebp)
78 movaps -40(%ebp), %xmm0
79 movaps %xmm0, -88(%ebp)
movaps -72(%ebp), %xmm0
divss -88(%ebp), %xmm0
movaps %xmm0, -40(%ebp)
a1 subl $12, %esp
a2 pushl $.LC2
a3 call puts
a4 addl $16, %esp
With those changes, the SIGFPE handler is called according to the divide-by-zero flag in MXCSR.
来源:https://stackoverflow.com/questions/39690198/capture-sigfpe-from-simd-instruction