x86-64

Why is imul used for multiplying unsigned numbers?

为君一笑 提交于 2019-11-29 14:39:55
问题 I compiled the following program: #include <stdint.h> uint64_t usquare(uint32_t x) { return (uint64_t)x * (uint64_t)x; } This disassembles to: 0: 89 f8 mov eax,edi 2: 48 0f af c0 imul rax,rax 6: c3 ret But imul is the instruction for multiplying signed numbers. Why is it used by gcc then? /edit: when using uint64_t the assembly is similar: 0: 48 0f af ff imul rdi,rdi 4: 48 89 f8 mov rax,rdi 7: c3 ret 回答1: TL:DR: because it's a faster way of getting the correct result when we don't care about

Problem switching to v8086 mode from 32-bit protected mode by setting EFLAGS.VM to 1

寵の児 提交于 2019-11-29 13:58:54
I'm in 32-bit protected mode running at current privilege level (CPL=0). I'm trying to enter v8086 mode by setting EFLAGS.VM (Bit 17) flag to 1 (and IOPL to 0) and doing a FAR JMP to my 16-bit real mode code. I get the current flags using PUSHF ; set EFLAGS.VM (bit 17) to 1; set EFLAGS.IOPL (bit 22 and bit 23) to 0; set the new EFLAGS with POPF . The code for this looks like: bits 32 cli [snip] pushf ; Get current EFLAGS pop eax or eax, 1<<EFLAGS_VM_BIT ; Set VM flag to enter v8086 mode and eax, ~(3<<EFLAGS_IOPL_BITS) ; Set IOPL to 0 ; IF flag already 0 because of earlier CLI push eax popf ;

Why does Clang do this optimization trick only from Sandy Bridge onward?

微笑、不失礼 提交于 2019-11-29 13:43:53
I noticed that Clang does an interesting division optimization trick for the following snippet int64_t s2(int64_t a, int64_t b) { return a/b; } Below is the assembly output if specifying march as Sandy Bridge or above mov rax, rdi mov rcx, rdi or rcx, rsi shr rcx, 32 je .LBB1_1 cqo idiv rsi ret .LBB1_1: xor edx, edx div esi ret Here are the Godbolt links for the signed version and the unsigned version From what I understand it checks whether the high bits of the two operands are zero, and does a 32-bit division if that's true I checked this table and see that the latencies for 32/64-bit

Why does MSVC not support inline assembly for AMD64 and Itanium targets?

好久不见. 提交于 2019-11-29 13:17:01
Yesterday I learned that inline assembly (with the __asm keyword) is not supported under Microsoft Visual C++ when compiling for AMD64 and Itanium targets. Is that correct? And if so, does anyone know why they would not support inline assembly for those targets? It seems like a rather big feature to just drop... Correct, it still isn't supported in VS 2010 Beta 1 . My guess is that inline assembly is just too difficult to implement: the way Microsoft implemented it, it integrates with the surrounding C code so that data can flow in and out of the C code, and appropriate glue code is

clang (LLVM) inline assembly - multiple constraints with useless spills / reloads

偶尔善良 提交于 2019-11-29 13:15:03
clang / gcc : Some inline assembly operands can be satisfied with multiple constraints, e.g., "rm" , when an operand can be satisfied with a register or memory location. As an example, the 64 x 64 = 128 bit multiply: __asm__ ("mulq %q3" : "=a" (rl), "=d" (rh) : "%0" (x), "rm" (y) : "cc") The generated code appears to choose a memory constraint for argument 3 , which would be fine if we were register starved, to avoid a spill. Obviously there's less register pressure on x86-64 than on IA32. However, the assembly snippet generated (by clang ) is: movq %rcx, -8(%rbp) ## InlineAsm Start mulq -8(

SSE instruction MOVSD (extended: floating point scalar & vector operations on x86, x86-64)

夙愿已清 提交于 2019-11-29 12:56:58
I am somehow confused by the MOVSD assembly instruction. I wrote some numerical code computing some matrix multiplication, simply using ordinary C code with no SSE intrinsics. I do not even include the header file for SSE2 intrinsics for compilation. But when I check the assembler output, I see that: 1) 128-bit vector registers XMM are used; 2) SSE2 instruction MOVSD is invoked. I understand that MOVSD essentially operates on single double precision floating point. It only uses the lower 64-bit of an XMM register and set the upper 64-bit 0. But I just don't understand two things: 1) I never

Why MOV AH,1 is not supported in 64 bit mode of intel microprocessor?

六眼飞鱼酱① 提交于 2019-11-29 12:44:15
In the book "THE INTEL MICROPROCESSORS" of Barry B. Brey, it is written that MOV AH, 1 is not allowed in 64 bit mode, but allowed in 32 bit or 16 bit mode. If MOV AL, 1 can be allowed in 64 bit mode, what is the problem with MOV AH, 1 ? Johan There is no problem with mov ah,1 . It runs just fine in X64 mode. The opcode for it is b4 01 . The only time when mov ah is not allowed is when the mov has a REX prefix. from: http://www.felixcloutier.com/x86/MOV.html ***In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH , BH , CH , DH . In that

Multiplying two values and printing them to the screen (NASM, Linux)

人走茶凉 提交于 2019-11-29 12:28:07
I keep reading that in order for one to perform integer/floating point division on a register, the register(s) being performed on need to actually be initialized . I'm curious to what the proper assembler directive is to do this. Do I simply provide an address by something like: mov ecx, 0x65F ;0x65F represents an address for ecx to point to . And then promptly (later in code) do something like: mov byte [ecx], 0xA ;move the value of 0xA into the contents of ecx, using only a byte's worth of data Is this the proper way to perform such an operation? If not, what is? Update Ok, so what I'm

JMP instruction - Hex code

十年热恋 提交于 2019-11-29 12:24:17
问题 Have a doubt regarding the hex code conversion of JMP machine instruction. I have the absolute address I want to jump to, say "JMP 0x400835". First of all, is this allowed? If yes, what would be the corresponding hex code? If not, can I first store the address in some register, say EAX and then put "JMP EAX"? I am working on x86(64b) architecture. I have tried to print out the hex code from the diassem output in gdb, but there is no consistency, ie, I do not see the destination address in the

GCC: putchar(char) in inline assembly

流过昼夜 提交于 2019-11-29 12:21:24
Overflow, how can I implement the putchar(char) procedure using inline assembly only? I would like to do this in x86-64 assembly. The reason for me doing this is to implement my own standard-lib (or at least part of it). Here is what I have so far: void putchar(char c) { /* your code here: print character c on stdout */ asm(...); } void _start() { /* exit system call */ asm("mov $1,%rax;" "xor %rbx,%rbx;" "int $0x80" ); } I am compiling with: gcc -nostdlib -o putchar putchar.c Thanks for helping me! When using GNU C inline asm, use constraints to tell the compiler where you want things ,