x86-64 | 易学教程

Why is uint_least16_t faster than uint_fast16_t for multiplication in x86_64?

阅读更多关于 Why is uint_least16_t faster than uint_fast16_t for multiplication in x86_64?

问题 The C standard is quite unclear about the uint_fast*_t family of types. On a gcc-4.4.4 linux x86_64 system, the types uint_fast16_t and uint_fast32_t are both 8 bytes in size. However, multiplication of 8-byte numbers seems to be fairly slower than multiplication of 4-byte numbers. The following piece of code demonstrates that: #include <stdio.h> #include <stdint.h> #include <inttypes.h> int main () { uint_least16_t p, x; int count; p = 1; for (count = 100000; count != 0; --count) for (x = 1;

What does the “mov rax, QWORD PTR fs:0x28” assembly instruction do? [duplicate]

阅读更多关于 What does the “mov rax, QWORD PTR fs:0x28” assembly instruction do? [duplicate]

This question already has answers here : Closed last year . Why does this memory address %fs:0x28 ( fs[0x28] ) have a random value? (3 answers) Immediately before this instruction is executed fs contains 0x0. Also I'd like to know how I can read from this memory area in GDB, what would the command for that be? Jonathon Reinhart The fs and gs registers in modern OSes like Linux and Windows point to thread-specific and other OS-defined structures. Modifying the segment register is a protected instruction, so only the OS can set these up for you. This question should help explain what exactly the

IBM Mobile First - Json Store not working on Samsung Galaxy S6

阅读更多关于 IBM Mobile First - Json Store not working on Samsung Galaxy S6

问题 We're building a hybrid app with IBM Mobile First Platform (7.0) for iOS and Android platforms. We're using JSONStore to save user non-confidential data (we're not cyphering the data stored). When we deploy the application to a Samsung Galaxy S6 (Model SM-G920I) we're having this error on the init method of the Json Store: Error code: -11 OPERATION_FAILED_ON_SPECIFIC_DOCUMENT IBM Mobile First Platform - JSONStore errors Error details : "dlopen failed: "/data/data/com.MyMobileApp/files

Intriguing assembly for comparing std::optional of primitive types

阅读更多关于 Intriguing assembly for comparing std::optional of primitive types

Valgrind picked up a flurry Conditional jump or move depends on uninitialised value(s) in one of my unit tests. Inspecting the assembly, I realized that the following code: bool operator==(MyType const& left, MyType const& right) { // ... some code ... if (left.getA() != right.getA()) { return false; } // ... some code ... return true; } Where MyType::getA() const -> std::optional<std::uint8_t> , generated the following assembly: 0x00000000004d9588 <+108>: xor eax,eax 0x00000000004d958a <+110>: cmp BYTE PTR [r14+0x1d],0x0 0x00000000004d958f <+115>: je 0x4d9597 <... function... +123> x

What is callq instruction?

阅读更多关于 What is callq instruction?

问题 I have some gnu assembler code for the x86_64 architecture generated by a tool and there are these instructions: movq %rsp, %rbp leaq str(%rip), %rdi callq puts movl $0, %eax I can not find actual documentation on the "callq" instruction. I have looked at http://support.amd.com/TechDocs/24594.pdf which is "AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions" but they only describe CALL near and far instructions. I have looked at documentation for gnu

How much cycles math functions take on modern processors

阅读更多关于 How much cycles math functions take on modern processors

We know that modern processors execute instructions such as cosine and sin directly on the processor as they have opcodes for it. My question is how much cycles these instructions normally take. Do they take constant time or depend upon input parameters? Talking about "cycles for an instruction" for modern processors got to be difficult quite a while ago. Processors these days contain multiple execution cores, their operation can overlap and can execute out-of-order. A good example of the essential consideration is given in the Intel processor manual, volume 4, appendix C. It breaks down

Porting 32 bit C++ code to 64 bit - is it worth it? Why?

阅读更多关于 Porting 32 bit C++ code to 64 bit - is it worth it? Why?

问题 I am aware of some the obvious gains of the x64 architecture (higher addressable RAM addresses, etc)... but: What if my program has no real need to run in native 64 bit mode. Should I port it anyway? Are there any foreseeable deadlines for ending 32 bit support? Would my application run faster / better / more secure as native x64 code? 回答1: x86-64 is a bit of a special case - for many architectures (eg. SPARC), compiling an application for 64 bit mode doesn't give it any benefit unless it can

x86_64 calling conventions and stack frames

阅读更多关于 x86_64 calling conventions and stack frames

问题 I am trying to make sense out of the executable code that GCC (4.4.3) is generating for an x86_64 machine running under Ubuntu Linux. In particular, I don't understand how the code keeps track of stack frames. In the old days, in 32-bit code, I was accustomed to seeing this "prologue" in just about every function: push %ebp movl %esp, %ebp Then, at the end of the function, there would come an "epilogue," either sub $xx, %esp # Where xx is a number based on GCC's accounting. pop %ebp ret or

difference between MMX and XMM register?

阅读更多关于 difference between MMX and XMM register?

问题 I'm currently learning assembly programming on Intel x86 processor. Could someone please explain to me, what is the difference between MMX and XMM register? I'm very confused in terms of what functions they serve and the difference and similarities between them? 回答1: MM registers are the registers used by the MMX instruction set, one of the first attempts to add (integer-only) SIMD to x86. They are 64 bit wide and they are actually aliases for the mantissa parts of the x87 registers (but they

Is there an 8-bit atomic CAS (cmpxchg) intrinsic for X64 in Visual C++?

阅读更多关于 Is there an 8-bit atomic CAS (cmpxchg) intrinsic for X64 in Visual C++?

The following code is possible in 32-bit Visual Studio C++. Is there a 64-bit equivalent using intrinsics since inline ASM isn't supported in the 64-bit version of Visual Studio C++? FORCEINLINE bool bAtomicCAS8(volatile UINT8 *dest, UINT8 oldval, UINT8 newval) { bool result=false; __asm { mov al,oldval mov edx,dest mov cl,newval lock cmpxchg byte ptr [edx],cl setz result } return(result); } The following instrinsics compile under Visual Studio C++ _InterlockedCompareExchange16 _InterlockedCompareExchange _InterlockedCompareExchange64 _InterlockedCompareExchange128 What I am looking for is