x86-64 | 易学教程

How do I tell gcc that my inline assembly clobbers part of the stack?

阅读更多关于 How do I tell gcc that my inline assembly clobbers part of the stack?

问题 Consider inline assembly like this: uint64_t flags; asm ("pushf\n\tpop %0" : "=rm"(flags) : : /* ??? */); Nonwithstanding the fact that there is probably some kind of intrinsic to get the contents of RFLAGS, how do I indicate to the compiler that my inline assembly clobbers one quadword of memory at the top of stack? 回答1: As far as I am concerned, this is currently not possible. 来源： https://stackoverflow.com/questions/39160450/how-do-i-tell-gcc-that-my-inline-assembly-clobbers-part-of-the

Assembler Error: Mach-O 64 bit does not support absolute 32 bit addresses

阅读更多关于 Assembler Error: Mach-O 64 bit does not support absolute 32 bit addresses

问题 So I'm learning x86_64 nasm assembly on my mac for fun. After hello world and some basic arithmetic, I tried copying a slightly more advanced hello world program from this site and modifying it for 64 bit intel, but I can't get rid of this one error message: hello.s:53: error: Mach-O 64-bit format does not support 32-bit absolute addresses . Here is the command I use to assemble and link: nasm -f macho64 hello.s && ld -macosx_version_min 10.6 hello.o . And here is the relevant line: cmp rsi,

GCC doesn't make use of inc

阅读更多关于 GCC doesn't make use of inc

问题 The GCC compiler $ gcc --version gcc (GCC) 4.8.2 ... doesn't generate an inc assembly instruction, where it could actually be useful, like in this C program: int main(int argc, char **argv) { int sum = 0; int i; for(i = 0; i < 1000000000L; i++) <---- that "i++" sum += i; return sum; } Instead, it generates an add instruction: 0000000000000000 <main>: 0: 31 d2 xor %edx,%edx 2: 31 c0 xor %eax,%eax 4: 0f 1f 40 00 nopl 0x0(%rax) 8: 01 d0 add %edx,%eax a: 83 c2 01 add $0x1,%edx <---- HERE d: 81 fa

GCC doesn't make use of inc

阅读更多关于 GCC doesn't make use of inc

Understanding %rip register in intel assembly

阅读更多关于 Understanding %rip register in intel assembly

问题 Concerning the following small code, which was illustrated in another post about the size of structure and all the possibilities to align data correctly : struct { char Data1; short Data2; int Data3; char Data4; } x; unsigned fun ( void ) { x.Data1=1; x.Data2=2; x.Data3=3; x.Data4=4; return(sizeof(x)); } I get the corresponding disassembly (with 64 bits) 0000000000000000 <fun>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # b <fun+0xb> b: 66 c7 05 00

Understanding %rip register in intel assembly

阅读更多关于 Understanding %rip register in intel assembly

NEON, SSE and interleaving loads vs shuffles

阅读更多关于 NEON, SSE and interleaving loads vs shuffles

问题 I'm trying to understand the comment made by "Iwillnotexist Idonotexist" at SIMD optimization of cvtColor using ARM NEON intrinsics: ... why you don't use the ARM NEON intrisics that map to the VLD3 instruction? That spares you all of the shuffling, both simplifying and speeding up the code. The Intel SSE implementation requires shuffles because it lacks 2/3/4-way deinterleaving load instructions, but you shouldn't pass on them when they are available. The trouble I am having is the solution

Cassandra Startup Error 1.2.6 on Linux x86_64

阅读更多关于 Cassandra Startup Error 1.2.6 on Linux x86_64

问题 Trying to install cassandra on linux from latest stable release - http://cassandra.apache.org/download/ - 1.2.6 I have modified the cassndra.yaml to point to a custom directory instead of /var since I do not have write access on /var I am seeing this error on startup. Not able to find any answers on google yet since the release seems relatively new. Just posting it here in case its a silly mistake on my side. Same distribution file worked fine on my macos x86_64 machine. INFO 19:24:35,513 Not

SIMD versions of SHLD/SHRD instructions

阅读更多关于 SIMD versions of SHLD/SHRD instructions

问题 SHLD/SHRD instructions are assembly instructions to implement multiprecisions shifts. Consider the following problem: uint64_t array[4] = {/*something*/}; left_shift(array, 172); right_shift(array, 172); What is the most efficient way to implement left_shift and right_shift , two functions that operates a shift on an array of four 64-bit unsigned integer as if it was a big 256 bits unsigned integer? Is the most efficient way of doing that is by using SHLD/SHRD instructions, or is there better

Can counting byte matches between two strings be optimized using SIMD?

阅读更多关于 Can counting byte matches between two strings be optimized using SIMD?

问题 Profiling suggests that this function here is a real bottle neck for my application: static inline int countEqualChars(const char* string1, const char* string2, int size) { int r = 0; for (int j = 0; j < size; ++j) { if (string1[j] == string2[j]) { ++r; } } return r; } Even with -O3 and -march=native , G++ 4.7.2 does not vectorize this function (I checked the assembler output). Now, I'm not an expert with SSE and friends, but I think that comparing more than one character at once should be