x86-64

How do I tell gcc that my inline assembly clobbers part of the stack?

别说谁变了你拦得住时间么 提交于 2019-12-19 09:03:58
问题 Consider inline assembly like this: uint64_t flags; asm ("pushf\n\tpop %0" : "=rm"(flags) : : /* ??? */); Nonwithstanding the fact that there is probably some kind of intrinsic to get the contents of RFLAGS, how do I indicate to the compiler that my inline assembly clobbers one quadword of memory at the top of stack? 回答1: As far as I am concerned, this is currently not possible. 来源: https://stackoverflow.com/questions/39160450/how-do-i-tell-gcc-that-my-inline-assembly-clobbers-part-of-the

Assembler Error: Mach-O 64 bit does not support absolute 32 bit addresses

偶尔善良 提交于 2019-12-19 07:49:37
问题 So I'm learning x86_64 nasm assembly on my mac for fun. After hello world and some basic arithmetic, I tried copying a slightly more advanced hello world program from this site and modifying it for 64 bit intel, but I can't get rid of this one error message: hello.s:53: error: Mach-O 64-bit format does not support 32-bit absolute addresses . Here is the command I use to assemble and link: nasm -f macho64 hello.s && ld -macosx_version_min 10.6 hello.o . And here is the relevant line: cmp rsi,

GCC doesn't make use of inc

六眼飞鱼酱① 提交于 2019-12-19 06:46:11
问题 The GCC compiler $ gcc --version gcc (GCC) 4.8.2 ... doesn't generate an inc assembly instruction, where it could actually be useful, like in this C program: int main(int argc, char **argv) { int sum = 0; int i; for(i = 0; i < 1000000000L; i++) <---- that "i++" sum += i; return sum; } Instead, it generates an add instruction: 0000000000000000 <main>: 0: 31 d2 xor %edx,%edx 2: 31 c0 xor %eax,%eax 4: 0f 1f 40 00 nopl 0x0(%rax) 8: 01 d0 add %edx,%eax a: 83 c2 01 add $0x1,%edx <---- HERE d: 81 fa

GCC doesn't make use of inc

做~自己de王妃 提交于 2019-12-19 06:45:51
问题 The GCC compiler $ gcc --version gcc (GCC) 4.8.2 ... doesn't generate an inc assembly instruction, where it could actually be useful, like in this C program: int main(int argc, char **argv) { int sum = 0; int i; for(i = 0; i < 1000000000L; i++) <---- that "i++" sum += i; return sum; } Instead, it generates an add instruction: 0000000000000000 <main>: 0: 31 d2 xor %edx,%edx 2: 31 c0 xor %eax,%eax 4: 0f 1f 40 00 nopl 0x0(%rax) 8: 01 d0 add %edx,%eax a: 83 c2 01 add $0x1,%edx <---- HERE d: 81 fa

Understanding %rip register in intel assembly

余生颓废 提交于 2019-12-19 06:02:25
问题 Concerning the following small code, which was illustrated in another post about the size of structure and all the possibilities to align data correctly : struct { char Data1; short Data2; int Data3; char Data4; } x; unsigned fun ( void ) { x.Data1=1; x.Data2=2; x.Data3=3; x.Data4=4; return(sizeof(x)); } I get the corresponding disassembly (with 64 bits) 0000000000000000 <fun>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # b <fun+0xb> b: 66 c7 05 00

Understanding %rip register in intel assembly

China☆狼群 提交于 2019-12-19 06:00:03
问题 Concerning the following small code, which was illustrated in another post about the size of structure and all the possibilities to align data correctly : struct { char Data1; short Data2; int Data3; char Data4; } x; unsigned fun ( void ) { x.Data1=1; x.Data2=2; x.Data3=3; x.Data4=4; return(sizeof(x)); } I get the corresponding disassembly (with 64 bits) 0000000000000000 <fun>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # b <fun+0xb> b: 66 c7 05 00

NEON, SSE and interleaving loads vs shuffles

独自空忆成欢 提交于 2019-12-19 04:55:14
问题 I'm trying to understand the comment made by "Iwillnotexist Idonotexist" at SIMD optimization of cvtColor using ARM NEON intrinsics: ... why you don't use the ARM NEON intrisics that map to the VLD3 instruction? That spares you all of the shuffling, both simplifying and speeding up the code. The Intel SSE implementation requires shuffles because it lacks 2/3/4-way deinterleaving load instructions, but you shouldn't pass on them when they are available. The trouble I am having is the solution

Cassandra Startup Error 1.2.6 on Linux x86_64

北慕城南 提交于 2019-12-19 04:07:57
问题 Trying to install cassandra on linux from latest stable release - http://cassandra.apache.org/download/ - 1.2.6 I have modified the cassndra.yaml to point to a custom directory instead of /var since I do not have write access on /var I am seeing this error on startup. Not able to find any answers on google yet since the release seems relatively new. Just posting it here in case its a silly mistake on my side. Same distribution file worked fine on my macos x86_64 machine. INFO 19:24:35,513 Not

SIMD versions of SHLD/SHRD instructions

白昼怎懂夜的黑 提交于 2019-12-19 02:51:29
问题 SHLD/SHRD instructions are assembly instructions to implement multiprecisions shifts. Consider the following problem: uint64_t array[4] = {/*something*/}; left_shift(array, 172); right_shift(array, 172); What is the most efficient way to implement left_shift and right_shift , two functions that operates a shift on an array of four 64-bit unsigned integer as if it was a big 256 bits unsigned integer? Is the most efficient way of doing that is by using SHLD/SHRD instructions, or is there better

Can counting byte matches between two strings be optimized using SIMD?

随声附和 提交于 2019-12-19 00:38:52
问题 Profiling suggests that this function here is a real bottle neck for my application: static inline int countEqualChars(const char* string1, const char* string2, int size) { int r = 0; for (int j = 0; j < size; ++j) { if (string1[j] == string2[j]) { ++r; } } return r; } Even with -O3 and -march=native , G++ 4.7.2 does not vectorize this function (I checked the assembler output). Now, I'm not an expert with SSE and friends, but I think that comparing more than one character at once should be