x86-64 | 易学教程

Why is imul used for multiplying unsigned numbers?

阅读更多关于 Why is imul used for multiplying unsigned numbers?

问题 I compiled the following program: #include <stdint.h> uint64_t usquare(uint32_t x) { return (uint64_t)x * (uint64_t)x; } This disassembles to: 0: 89 f8 mov eax,edi 2: 48 0f af c0 imul rax,rax 6: c3 ret But imul is the instruction for multiplying signed numbers. Why is it used by gcc then? /edit: when using uint64_t the assembly is similar: 0: 48 0f af ff imul rdi,rdi 4: 48 89 f8 mov rax,rdi 7: c3 ret 回答1: TL:DR: because it's a faster way of getting the correct result when we don't care about

Problem switching to v8086 mode from 32-bit protected mode by setting EFLAGS.VM to 1

阅读更多关于 Problem switching to v8086 mode from 32-bit protected mode by setting EFLAGS.VM to 1

I'm in 32-bit protected mode running at current privilege level (CPL=0). I'm trying to enter v8086 mode by setting EFLAGS.VM (Bit 17) flag to 1 (and IOPL to 0) and doing a FAR JMP to my 16-bit real mode code. I get the current flags using PUSHF ; set EFLAGS.VM (bit 17) to 1; set EFLAGS.IOPL (bit 22 and bit 23) to 0; set the new EFLAGS with POPF . The code for this looks like: bits 32 cli [snip] pushf ; Get current EFLAGS pop eax or eax, 1<<EFLAGS_VM_BIT ; Set VM flag to enter v8086 mode and eax, ~(3<<EFLAGS_IOPL_BITS) ; Set IOPL to 0 ; IF flag already 0 because of earlier CLI push eax popf ;

Why does Clang do this optimization trick only from Sandy Bridge onward?

阅读更多关于 Why does Clang do this optimization trick only from Sandy Bridge onward?

I noticed that Clang does an interesting division optimization trick for the following snippet int64_t s2(int64_t a, int64_t b) { return a/b; } Below is the assembly output if specifying march as Sandy Bridge or above mov rax, rdi mov rcx, rdi or rcx, rsi shr rcx, 32 je .LBB1_1 cqo idiv rsi ret .LBB1_1: xor edx, edx div esi ret Here are the Godbolt links for the signed version and the unsigned version From what I understand it checks whether the high bits of the two operands are zero, and does a 32-bit division if that's true I checked this table and see that the latencies for 32/64-bit

Why does MSVC not support inline assembly for AMD64 and Itanium targets?

阅读更多关于 Why does MSVC not support inline assembly for AMD64 and Itanium targets?

Yesterday I learned that inline assembly (with the __asm keyword) is not supported under Microsoft Visual C++ when compiling for AMD64 and Itanium targets. Is that correct? And if so, does anyone know why they would not support inline assembly for those targets? It seems like a rather big feature to just drop... Correct, it still isn't supported in VS 2010 Beta 1 . My guess is that inline assembly is just too difficult to implement: the way Microsoft implemented it, it integrates with the surrounding C code so that data can flow in and out of the C code, and appropriate glue code is

clang (LLVM) inline assembly - multiple constraints with useless spills / reloads

阅读更多关于 clang (LLVM) inline assembly - multiple constraints with useless spills / reloads

clang / gcc : Some inline assembly operands can be satisfied with multiple constraints, e.g., "rm" , when an operand can be satisfied with a register or memory location. As an example, the 64 x 64 = 128 bit multiply: __asm__ ("mulq %q3" : "=a" (rl), "=d" (rh) : "%0" (x), "rm" (y) : "cc") The generated code appears to choose a memory constraint for argument 3 , which would be fine if we were register starved, to avoid a spill. Obviously there's less register pressure on x86-64 than on IA32. However, the assembly snippet generated (by clang ) is: movq %rcx, -8(%rbp) ## InlineAsm Start mulq -8(

SSE instruction MOVSD (extended: floating point scalar & vector operations on x86, x86-64)

阅读更多关于 SSE instruction MOVSD (extended: floating point scalar & vector operations on x86, x86-64)

I am somehow confused by the MOVSD assembly instruction. I wrote some numerical code computing some matrix multiplication, simply using ordinary C code with no SSE intrinsics. I do not even include the header file for SSE2 intrinsics for compilation. But when I check the assembler output, I see that: 1) 128-bit vector registers XMM are used; 2) SSE2 instruction MOVSD is invoked. I understand that MOVSD essentially operates on single double precision floating point. It only uses the lower 64-bit of an XMM register and set the upper 64-bit 0. But I just don't understand two things: 1) I never

Why MOV AH,1 is not supported in 64 bit mode of intel microprocessor?

阅读更多关于 Why MOV AH,1 is not supported in 64 bit mode of intel microprocessor?

In the book "THE INTEL MICROPROCESSORS" of Barry B. Brey, it is written that MOV AH, 1 is not allowed in 64 bit mode, but allowed in 32 bit or 16 bit mode. If MOV AL, 1 can be allowed in 64 bit mode, what is the problem with MOV AH, 1 ? Johan There is no problem with mov ah,1 . It runs just fine in X64 mode. The opcode for it is b4 01 . The only time when mov ah is not allowed is when the mov has a REX prefix. from: http://www.felixcloutier.com/x86/MOV.html ***In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH , BH , CH , DH . In that

Multiplying two values and printing them to the screen (NASM, Linux)

阅读更多关于 Multiplying two values and printing them to the screen (NASM, Linux)

I keep reading that in order for one to perform integer/floating point division on a register, the register(s) being performed on need to actually be initialized . I'm curious to what the proper assembler directive is to do this. Do I simply provide an address by something like: mov ecx, 0x65F ;0x65F represents an address for ecx to point to . And then promptly (later in code) do something like: mov byte [ecx], 0xA ;move the value of 0xA into the contents of ecx, using only a byte's worth of data Is this the proper way to perform such an operation? If not, what is? Update Ok, so what I'm

JMP instruction - Hex code

阅读更多关于 JMP instruction - Hex code

问题 Have a doubt regarding the hex code conversion of JMP machine instruction. I have the absolute address I want to jump to, say "JMP 0x400835". First of all, is this allowed? If yes, what would be the corresponding hex code? If not, can I first store the address in some register, say EAX and then put "JMP EAX"? I am working on x86(64b) architecture. I have tried to print out the hex code from the diassem output in gdb, but there is no consistency, ie, I do not see the destination address in the

GCC: putchar(char) in inline assembly

阅读更多关于 GCC: putchar(char) in inline assembly

Overflow, how can I implement the putchar(char) procedure using inline assembly only? I would like to do this in x86-64 assembly. The reason for me doing this is to implement my own standard-lib (or at least part of it). Here is what I have so far: void putchar(char c) { /* your code here: print character c on stdout */ asm(...); } void _start() { /* exit system call */ asm("mov $1,%rax;" "xor %rbx,%rbx;" "int $0x80" ); } I am compiling with: gcc -nostdlib -o putchar putchar.c Thanks for helping me! When using GNU C inline asm, use constraints to tell the compiler where you want things ,