memory-alignment | 易学教程

GCC generated assembly for unaligned float access on ARM

阅读更多关于 GCC generated assembly for unaligned float access on ARM

问题 Hello I am currently working on a program where I need to process a data blob that contains a series of floats which could be unaligned (and also are sometimes). I am compiling with gcc 4.6.2 for an ARM cortex-a8. I have a question to the generated assembly code: As example I wrote a minimal example: For the following test code float aligned[2]; float *unaligned = (float*)(((char*)aligned)+2); int main(int argc, char **argv) { float f = unaligned[0]; return (int)f; } the compiler (gcc 4.6.2 -

GCC generated assembly for unaligned float access on ARM

阅读更多关于 GCC generated assembly for unaligned float access on ARM

Why does instruction cache alignment improve performance in set associative cache implementations?

阅读更多关于 Why does instruction cache alignment improve performance in set associative cache implementations?

问题 I have a question regarding instruction cache alignment. I've heard that for micro-optimizations, aligning loops so that they fit inside a cache line can slightly improve performance. I don't see why that would do anything. I understand the concept of cache hits and their importance in computing speed. But it seems that in set associative caches, adjacent blocks of code will not be mapped to the same cache set. So if the loop crosses a code block the CPU should still get a cache hit since

Why does instruction cache alignment improve performance in set associative cache implementations?

阅读更多关于 Why does instruction cache alignment improve performance in set associative cache implementations?

Why does instruction cache alignment improve performance in set associative cache implementations?

阅读更多关于 Why does instruction cache alignment improve performance in set associative cache implementations?

Should %rsp be aligned to 16-byte boundary before calling a function in NASM?

阅读更多关于 Should %rsp be aligned to 16-byte boundary before calling a function in NASM?

问题 I saw the following rules from NASM's document: The stack pointer %rsp must be aligned to a 16-byte boundary before making a call. Fine, but the process of making a call pushes the return address (8 bytes) on the stack, so when a function gets control, %rsp is not aligned. You have to make that extra space yourself, by pushing something or subtracting 8 from %rsp. And I have a snippet of NASM assembly code as below: The %rsp should be at the boundary of 8-bytes before I call the function "inc

Wide string libc functions on unaligned memory

阅读更多关于 Wide string libc functions on unaligned memory

问题 So I've discovered after painful debugging that libc functions like wcslen will fail silently when dealing with non memory-aligned buffers. In my case doing a wcslen( mystr ) resulted in a faulty length value, which only later on produced a crash (in wcstombs, assert buff[-1] == 0). One solution would be for me to re-write all the wide string functions I need to work on non-aligned memory. This is easy enough but also dirty, and since there is not doc about which parts of libc support non

Wide string libc functions on unaligned memory

阅读更多关于 Wide string libc functions on unaligned memory

Why does GCC 6 assume data is 16-byte aligned?

阅读更多关于 Why does GCC 6 assume data is 16-byte aligned?

问题 (Sorry in advance for not having managed to reduce my problem to a simple failing test case...) I have faced issues with upgrading to GCC 6.3.0 to build our codebase (relevant flags: -O3 -m32 ). Specifically, my application segfaults within a struct ctor call because of GCC optimizations. In this ctor, GCC used movaps : movaps %xmm0,0x30a0(%ebx) movaps requires the operand to be 16-byte aligned . But at this point in time, %ebx points to my object, which is not necessarily 16-byte aligned .

Why does GCC 6 assume data is 16-byte aligned?

阅读更多关于 Why does GCC 6 assume data is 16-byte aligned?