问题
I use __int128
as struct's member.
It works find with -O0
(no optimization).
However it crashes for segment fault if optimization enabled (-O1
).
It crashes at instruction movdqa
, which need the var aligned by 16.
While the address is allocated by malloc()
which align only by 8.
I tried to disable SSE optimization by -mno-sse
, but it fails to compile:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:27:1: error: SSE register return with SSE disabled
So what can I do if I want to use __int128
and -O1
both?
Thanks in advance Wu
BTW, it seems OK if __int128
is used only on stack (not on heap).
==== EDIT ====
Sorry that I did not say the truth.
In fact I did not use malloc()
. I used a memory pool lib which returns address aligned by 8.
I said malloc()
just to want to make things simple.
After testing, I have known that malloc()
aligns by 16. And the __int128
member also align by 16 in struct.
So the problem is my memory pool lib only.
Thanks very much.
回答1:
For x86-64 System V, alignof(maxalign_t) == 16
so malloc
always returns 16-byte aligned pointers. It sounds like your allocator is broken, and would violate the ABI if used for long double
as well. (Reposting this as an answer because it turns out it was the answer).
Memory returned by malloc
is guaranteed to be able to hold any standard type, so that means being aligned enough if the size is large enough.
This can't be 32-bit code, because gcc doesn't support __int128
in 32-bit targets. (32-bit glibc malloc
only guarantees 8-byte alignment.)
In general, the compiler is allowed to make code that faults if you violate the alignment requirements of types. On x86 things typically just work with misaligned memory until the compiler uses alignment-required SIMD instructions. Even auto-vectorization with a mis-aligned uint16_t*
can fault (Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?), so don't assume that narrow types are always safe. Use memcpy
if you need to express an unaligned load in C.
Apparently alignof(__int128)
is 16. So they aren't repeating the weirdness from i386 System V where even 8-byte objects are only guaranteed 4-byte alignment, and struct-packing rules mean that compilers can't give them natural alignment.
This is a Good Thing, because it makes it efficient to copy with SSE, and means _Atomic __int128
doesn't need any extra special handling to avoid cache-line splits that would make lock cmpxchg16b
very slow.
来源:https://stackoverflow.com/questions/52531695/int128-alignment-segment-fault-with-gcc-o-sse-optimize