x86-64 | 易学教程

Division and modulus using single divl instruction (i386, amd64)

阅读更多关于 Division and modulus using single divl instruction (i386, amd64)

I was trying to come up with inline assembly for gcc to get both division and modulus using single divl instruction. Unfortunately, I am not that good at assembly. Could someone please help me on this? Thank you. Yes -- a divl will produce the quotient in eax and the remainder in edx. Using Intel syntax, for example: mov eax, 17 mov ebx, 3 xor edx, edx div ebx ; eax = 5 ; edx = 2 You're looking for something like this: __asm__("divl %2\n" : "=d" (remainder), "=a" (quotient) : "g" (modulus), "d" (high), "a" (low)); Although I agree with the other commenters that usually GCC will do this for you

Write a jump command to a x86-64 binary file

阅读更多关于 Write a jump command to a x86-64 binary file

问题 I'm debugging a Mac OS X 64bit app with GDB. I see that jumping over a chunk of code solves all my problems. But: How can I patch the executable file to implement the jump? I want the app to automatically jump to a defined point in the code without the debugger. This is what I want to do: At address 0x1000027a9 (given by the debugger) jump to address 0x100003b6e . I'm trying very hard to do it via HexEdit, but with no success. I read anywhere about jmp to absolute addresses opcodes ( FF seems

Use ld on 64-bit platform to generate 32-bit executable

阅读更多关于 Use ld on 64-bit platform to generate 32-bit executable

I wrote an assembly that is assemble with: $as --32 -o hello.o hello.s Then I tried to generate an executable with: $ld -o hello hello.o It gives me an error: ld: i386 architecture of input file `ConditionalBranching.o' is incompatible with i386:x86-64 output I tried using flag -m32 or --32, but ld dont take them. I cannot find a solution by reading the man page of ld. How can I generate a 32-bit binary from my 32-bit shared object? Your linker is attempting to create a 64-bit binary, but your assembly code was assembled for a 32-bit architecture. This creates a mismatch. Fix this by passing

Is it possible to know the address of a cache miss?

阅读更多关于 Is it possible to know the address of a cache miss?

Whenever a cache miss occurs, is it possible to know the address of that missed cache line? Are there any hardware performance counters in modern processors that can provide such information? Yes, on modern Intel hardware there are precise memory sampling events that track not only the address of the instruction, but the data address as well. These events also includes a great deal of other information, such as what level of the cache hierarchy the memory access was satisfied it, the total latency and so on. You can use perf mem to sample this information and produces a report. For example,

How do I implement an efficient 32 bit DivMod in 64 bit code

阅读更多关于 How do I implement an efficient 32 bit DivMod in 64 bit code

I want to use a DivMod function that operates exclusively on 32 bit operands. The implementation in the RTL returns values in 16 bit variables. Its declaration is: procedure DivMod(Dividend: Cardinal; Divisor: Word; var Result, Remainder: Word); So, I cannot use that since my inputs may overflow the return values. The naive Pascal implementation looks like this: procedure DivMod(Dividend, Divisor: Cardinal; out Quotient, Remainder: Cardinal); begin Quotient := Dividend div Divisor; Remainder := Dividend mod Divisor; end; This works splendidly but performs the division twice. Since the function

Segfault on movq instruction?

阅读更多关于 Segfault on movq instruction?

问题 Consider the following short program. int main(){ asm("movq 0x5F5E100, %rcx;" "startofloop: ; " "sub 0x1, %rcx; " "jne startofloop; "); } This program compiles fine, but when it is run, it segfaults on the initial movq instruction. I must be missing something obvious, but I hope someone here can point it out for me. I am running on Debian 8, with kernel 3.16.0-4-amd64, in case that is relevant. For future reference, this is what the compiler generated. main: .LFB0: .cfi_startproc pushq %rbp

Hardware performance counter APIs for Windows

阅读更多关于 Hardware performance counter APIs for Windows

I'd like to use hardware performance counter , specifically x86 CPUs to obtain cache misses or branch mis-prediction. Performance counters are heavily used in advanced profilers like Intel VTune. Please don't be confused performance counters on Windows operating systems. In order to use these counters in C/C++ program, one may use PAPI: http://icl.cs.utk.edu/papi/ This allows you to easily use performance counters, but on only Linux. PAPI once supported Windows, but not now. Is there anyone who recently tried PAPI or other APIs to use hardware performance counters on Windows? You can use RDPMC

64 bit assembly, when to use smaller size registers

阅读更多关于 64 bit assembly, when to use smaller size registers

I understand in x86_64 assembly there is for example the (64 bit) rax register, but it can also be accessed as a 32 bit register, eax, 16 bit, ax, and 8 bit, al. In what situation would I not just use the full 64 bits, and why, what advantage would there be? As an example, with this simple hello world program: section .data msg: db "Hello World!", 0x0a, 0x00 len: equ $-msg section .text global start start: mov rax, 0x2000004 ; System call write = 4 mov rdi, 1 ; Write to standard out = 1 mov rsi, msg ; The address of hello_world string mov rdx, len ; The size to write syscall ; Invoke the

What does the R stand for in RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP? [duplicate]

阅读更多关于 What does the R stand for in RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP? [duplicate]

This question already has an answer here: What do the E and R prefixes stand for in the names of Intel 32-bit and 64-bit registers? 1 answer The x86 assembler language has had to change as the x86 processor architecture has changed from 8bit to 16bit to 32bit and now 64bit. I know that in 32bit assembler register names (EAX, EBX, etc.), the E prefix for each of the names stands for Extended meaning the 32bit form of the register rather than the 16bit form (AX, BX, etc.). What does the R prefix for these register names stand for in 64bit? I think it's just R for "register", since there are

x86-64: Cache load and eviction instruction

阅读更多关于 x86-64: Cache load and eviction instruction

问题 For x86-64 architecture, is there an instruction that can load data at a given memory address to the cache? Similarly, is there an instruction that can evict a cache line given a memory address corresponding to that cache line (or something like a cache line identifier)? 回答1: prefetch data into cache (without loading it into a register): PREFETCHT0 [address] PREFETCHT1 [address] PREFETCHT2 [address] intrinsic: void _mm_prefetch (char const* p, int hint) See the insn ref manual and other