memory-alignment

GCC generated assembly for unaligned float access on ARM

陌路散爱 提交于 2021-02-20 17:56:41
问题 Hello I am currently working on a program where I need to process a data blob that contains a series of floats which could be unaligned (and also are sometimes). I am compiling with gcc 4.6.2 for an ARM cortex-a8. I have a question to the generated assembly code: As example I wrote a minimal example: For the following test code float aligned[2]; float *unaligned = (float*)(((char*)aligned)+2); int main(int argc, char **argv) { float f = unaligned[0]; return (int)f; } the compiler (gcc 4.6.2 -

GCC generated assembly for unaligned float access on ARM

三世轮回 提交于 2021-02-20 17:56:12
问题 Hello I am currently working on a program where I need to process a data blob that contains a series of floats which could be unaligned (and also are sometimes). I am compiling with gcc 4.6.2 for an ARM cortex-a8. I have a question to the generated assembly code: As example I wrote a minimal example: For the following test code float aligned[2]; float *unaligned = (float*)(((char*)aligned)+2); int main(int argc, char **argv) { float f = unaligned[0]; return (int)f; } the compiler (gcc 4.6.2 -

Why does instruction cache alignment improve performance in set associative cache implementations?

回眸只為那壹抹淺笑 提交于 2021-02-19 03:16:55
问题 I have a question regarding instruction cache alignment. I've heard that for micro-optimizations, aligning loops so that they fit inside a cache line can slightly improve performance. I don't see why that would do anything. I understand the concept of cache hits and their importance in computing speed. But it seems that in set associative caches, adjacent blocks of code will not be mapped to the same cache set. So if the loop crosses a code block the CPU should still get a cache hit since

Why does instruction cache alignment improve performance in set associative cache implementations?

蹲街弑〆低调 提交于 2021-02-19 03:14:17
问题 I have a question regarding instruction cache alignment. I've heard that for micro-optimizations, aligning loops so that they fit inside a cache line can slightly improve performance. I don't see why that would do anything. I understand the concept of cache hits and their importance in computing speed. But it seems that in set associative caches, adjacent blocks of code will not be mapped to the same cache set. So if the loop crosses a code block the CPU should still get a cache hit since

Why does instruction cache alignment improve performance in set associative cache implementations?

守給你的承諾、 提交于 2021-02-19 03:14:13
问题 I have a question regarding instruction cache alignment. I've heard that for micro-optimizations, aligning loops so that they fit inside a cache line can slightly improve performance. I don't see why that would do anything. I understand the concept of cache hits and their importance in computing speed. But it seems that in set associative caches, adjacent blocks of code will not be mapped to the same cache set. So if the loop crosses a code block the CPU should still get a cache hit since

Should %rsp be aligned to 16-byte boundary before calling a function in NASM?

社会主义新天地 提交于 2021-02-16 20:20:22
问题 I saw the following rules from NASM's document: The stack pointer %rsp must be aligned to a 16-byte boundary before making a call. Fine, but the process of making a call pushes the return address (8 bytes) on the stack, so when a function gets control, %rsp is not aligned. You have to make that extra space yourself, by pushing something or subtracting 8 from %rsp. And I have a snippet of NASM assembly code as below: The %rsp should be at the boundary of 8-bytes before I call the function "inc

Wide string libc functions on unaligned memory

和自甴很熟 提交于 2021-02-11 12:14:35
问题 So I've discovered after painful debugging that libc functions like wcslen will fail silently when dealing with non memory-aligned buffers. In my case doing a wcslen( mystr ) resulted in a faulty length value, which only later on produced a crash (in wcstombs, assert buff[-1] == 0). One solution would be for me to re-write all the wide string functions I need to work on non-aligned memory. This is easy enough but also dirty, and since there is not doc about which parts of libc support non

Wide string libc functions on unaligned memory

江枫思渺然 提交于 2021-02-11 12:12:40
问题 So I've discovered after painful debugging that libc functions like wcslen will fail silently when dealing with non memory-aligned buffers. In my case doing a wcslen( mystr ) resulted in a faulty length value, which only later on produced a crash (in wcstombs, assert buff[-1] == 0). One solution would be for me to re-write all the wide string functions I need to work on non-aligned memory. This is easy enough but also dirty, and since there is not doc about which parts of libc support non

Why does GCC 6 assume data is 16-byte aligned?

旧巷老猫 提交于 2021-02-07 12:16:30
问题 (Sorry in advance for not having managed to reduce my problem to a simple failing test case...) I have faced issues with upgrading to GCC 6.3.0 to build our codebase (relevant flags: -O3 -m32 ). Specifically, my application segfaults within a struct ctor call because of GCC optimizations. In this ctor, GCC used movaps : movaps %xmm0,0x30a0(%ebx) movaps requires the operand to be 16-byte aligned . But at this point in time, %ebx points to my object, which is not necessarily 16-byte aligned .

Why does GCC 6 assume data is 16-byte aligned?

有些话、适合烂在心里 提交于 2021-02-07 12:15:19
问题 (Sorry in advance for not having managed to reduce my problem to a simple failing test case...) I have faced issues with upgrading to GCC 6.3.0 to build our codebase (relevant flags: -O3 -m32 ). Specifically, my application segfaults within a struct ctor call because of GCC optimizations. In this ctor, GCC used movaps : movaps %xmm0,0x30a0(%ebx) movaps requires the operand to be 16-byte aligned . But at this point in time, %ebx points to my object, which is not necessarily 16-byte aligned .