memory-alignment | 易学教程

any way to stop unaligned access from c++ standard library on x86_64?

阅读更多关于 any way to stop unaligned access from c++ standard library on x86_64?

问题 I am trying to check for any unaligned reads in my program. I enable unaligned access processor exception via (using x86_64 on g++ on linux kernel 3.19): asm volatile("pushf \n" "pop %%rax \n" "or $0x40000, %%rax \n" "push %%rax \n" "popf \n" ::: "rax"); I do an optional forced unaligned read which triggers the exception so i know its working. After i disable that I get an error in a piece of code which otherwise seems fine : char fullpath[eMaxPath]; snprintf(fullpath, eMaxPath, "%s/%s",

Equivalent of memalign in cuda

阅读更多关于 Equivalent of memalign in cuda

问题 I am trying to parallelize a C function using CUDA. I noticed that there are several structs which are being passed as pointers to this function. With the unified memory view, I have identified and modified malloc() to cudaMallocManaged() . But, now there is a allocation using memalign() . I want to achieve a similar task as that was done by cudaMallocManaged() . Does such an equivalent exists ? If no, then what needs to be done? This is how the memalign() allocation line looks: float *data =

Generated assembly for extended alignment of stack variables [duplicate]

阅读更多关于 Generated assembly for extended alignment of stack variables [duplicate]

问题 This question already has answers here : Trying to understand gcc's complicated stack-alignment at the top of main that copies the return address (3 answers) Why is gcc generating an extra return address? (2 answers) What's up with gcc weird stack manipulation when it wants extra stack alignment? (1 answer) Closed 4 months ago . I was digging into the assembly of code that was using extended alignment for a stack-based variable. This is a smaller version of the code struct Something {

How to specify ELF section alignment in GNU as?

阅读更多关于 How to specify ELF section alignment in GNU as?

问题 I'm trying to use GNU as as a generic assembler similar in use as nasm . I make a template source like this: .section .text .globl _start .intel_syntax noprefix _start: call 0xb77431c0 # the instruction I want to assemble And then I run the assemble command like this: as --32 -o test.o test.s ld -m elf_i386 -Ttext 0xb77431d9 --oformat binary -o test.bin test.o All works well with binutils 2.24. But it appears that as from binutils 2.22 (the one in Ubuntu Precise) aligns .text section to the 4

Turn off Eigen Alignment in the PCL build

阅读更多关于 Turn off Eigen Alignment in the PCL build

问题 So I have an issue where Eigen Alignment causes serious issues with the operating system I use, QNX. Basically the OS cannot deal with the memory that way and causes very interesting seg faults. See my other question here. Any way, for this reason I wish to disable the Eigen Alignment used in PCL before I build it. I have a couple of ideas about how i might do this. EIGEN INCLUDE FILES IN PCL So PCL has the structure for most of its modules like this (this is an example of the features module

Pytables table dtype alignment

阅读更多关于 Pytables table dtype alignment

问题 If I create the following aligned Numpy array import numpy as np import tables as pt numrows = 10 dt = np.dtype([('date', [('year', '<i4'), ('month', '<i4'), ('day', '<i4')]), ('apples', '<f8'), ('oranges', '|S7'), ('pears', '<i4')], align=True) x = np.zeros(numrows, dtype=dt) for d in x.dtype.descr: print d and print the dtype.descr I get the following: ('date', [('year', '<i4'), ('month', '<i4'), ('day', '<i4')]) ('', '|V4') ('apples', '<f8') ('oranges', '|S7') ('', '|V1') ('pears', '<i4')

__int128 alignment segment fault with gcc -O SSE optimize

阅读更多关于 __int128 alignment segment fault with gcc -O SSE optimize

问题 I use __int128 as struct's member. It works find with -O0 (no optimization). However it crashes for segment fault if optimization enabled ( -O1 ). It crashes at instruction movdqa , which need the var aligned by 16. While the address is allocated by malloc() which align only by 8. I tried to disable SSE optimization by -mno-sse , but it fails to compile: /usr/include/x86_64-linux-gnu/bits/stdlib-float.h:27:1: error: SSE register return with SSE disabled So what can I do if I want to use _

Is this a GCC bug when using -falign-loops option?

阅读更多关于 Is this a GCC bug when using -falign-loops option?

问题 I was playing with this option to optimize a for-loop in our embedded architecture (here). However, I noticed that when the alignment requires more than a single nop instruction to be added, then the compiler generates one nop followed by as-many-as-required zeros ( 0000 ). I suspect it is a bug in our compiler, but can someone confirm it is not a bug in GCC? Here's a code snippet: __asm__ volatile("nop"); __asm__ volatile("nop"); for (j0=0; j0<N; j0+=4) { c[j0+ 0] = a[j0+ 0] + b[j0+ 0]; c[j0

Minimize total struct memory

阅读更多关于 Minimize total struct memory

问题 I have a struct : struct st { short a; int *b; char ch; }; short is 2 bytes int* is 8 bytes in x64 char is 1 bytes All the above together should give me 11 bytes. But if I do sizeof(st) I get 24 bytes. Why the struct uses more memory and how to reduce the memory to 11 bytes? 回答1: pragma pack is usually what is used, but its not as portable as you'd like. Here's the docs on it: Microsoft's pack GCC's Structure-Packing Pragmas Both provide #pragma pack(n) , push , and pop . In the absence of

How to specify alignment with _mm_mul_ps

阅读更多关于 How to specify alignment with _mm_mul_ps

问题 I am using an SSE intrinsic with one of the argument as a memory location ( _mm_mul_ps(xmm1,mem) ). I have a doubt which will be faster: xmm1 = _mm_mul_ps(xmm0,mem) // mem is 16 byte aligned or: xmm0 = _mm_load_ps(mem); xmm1 = _mm_mul_ps(xmm1,xmm0); Is there a way to specify alignment with _mm_mul_ps() intrinsic ? 回答1: There are no _mm_mul_ps(reg,mem) form even though mulps reg,mem instruction form exists - https://msdn.microsoft.com/en-us/library/22kbk6t9(v=vs.90).aspx What you can do is _mm