__int128 alignment segment fault with gcc -O SSE optimize

问题

I use __int128 as struct's member. It works find with -O0 (no optimization).

However it crashes for segment fault if optimization enabled (-O1).

It crashes at instruction movdqa, which need the var aligned by 16. While the address is allocated by malloc() which align only by 8.

I tried to disable SSE optimization by -mno-sse, but it fails to compile:

/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:27:1: error: SSE register return with SSE disabled

So what can I do if I want to use __int128 and -O1 both?

Thanks in advance Wu

BTW, it seems OK if __int128 is used only on stack (not on heap).

==== EDIT ====

Sorry that I did not say the truth.

In fact I did not use malloc(). I used a memory pool lib which returns address aligned by 8. I said malloc() just to want to make things simple.

After testing, I have known that malloc() aligns by 16. And the __int128 member also align by 16 in struct.

So the problem is my memory pool lib only.

Thanks very much.

回答1:

For x86-64 System V, alignof(maxalign_t) == 16 so malloc always returns 16-byte aligned pointers. It sounds like your allocator is broken, and would violate the ABI if used for long double as well. (Reposting this as an answer because it turns out it was the answer).

Memory returned by malloc is guaranteed to be able to hold any standard type, so that means being aligned enough if the size is large enough.

This can't be 32-bit code, because gcc doesn't support __int128 in 32-bit targets. (32-bit glibc malloc only guarantees 8-byte alignment.)

In general, the compiler is allowed to make code that faults if you violate the alignment requirements of types. On x86 things typically just work with misaligned memory until the compiler uses alignment-required SIMD instructions. Even auto-vectorization with a mis-aligned uint16_t* can fault (Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?), so don't assume that narrow types are always safe. Use memcpy if you need to express an unaligned load in C.

Apparently alignof(__int128) is 16. So they aren't repeating the weirdness from i386 System V where even 8-byte objects are only guaranteed 4-byte alignment, and struct-packing rules mean that compilers can't give them natural alignment.

This is a Good Thing, because it makes it efficient to copy with SSE, and means _Atomic __int128 doesn't need any extra special handling to avoid cache-line splits that would make lock cmpxchg16b very slow.

来源：https://stackoverflow.com/questions/52531695/int128-alignment-segment-fault-with-gcc-o-sse-optimize

标签

gcc

compiler-optimization

sse

memory-alignment