assembly

Is an extra move somehow faster when doing division-by-multiplication?

霸气de小男生 提交于 2020-05-26 17:24:11
问题 Consider this function: unsigned long f(unsigned long x) { return x / 7; } With -O3 , Clang turns the division into a multiplication, as expected: f: # @f movabs rcx, 2635249153387078803 mov rax, rdi mul rcx sub rdi, rdx shr rdi lea rax, [rdi + rdx] shr rax, 2 ret GCC does basically the same thing, except for using rdx where Clang uses rcx . But they both appear to be doing an extra move. Why not this instead? f: movabs rax, 2635249153387078803 mul rdi sub rdi, rdx shr rdi lea rax, [rdi + rdx

How to benchmark in Qemu i386 system using rdtsc

倖福魔咒の 提交于 2020-05-26 09:47:25
问题 Currently I am trying to measure number of clock cycles taken to complete an operation by two different programming languages on same environment. (without using an OS) Currently I am using Qemu-i386 emulator and using rdtsc to measure the clock cycles. /* Return the number of CPU ticks since boot. */ static inline u64 rdtsc(void) { u32 hi, lo; // asm("cpuid"); asm("rdtsc" : "=a" (lo), "=d" (hi)); return ((u64) lo) | (((u64) hi) << 32); } Taking the difference between rdtsc before and after

How to benchmark in Qemu i386 system using rdtsc

自作多情 提交于 2020-05-26 09:46:59
问题 Currently I am trying to measure number of clock cycles taken to complete an operation by two different programming languages on same environment. (without using an OS) Currently I am using Qemu-i386 emulator and using rdtsc to measure the clock cycles. /* Return the number of CPU ticks since boot. */ static inline u64 rdtsc(void) { u32 hi, lo; // asm("cpuid"); asm("rdtsc" : "=a" (lo), "=d" (hi)); return ((u64) lo) | (((u64) hi) << 32); } Taking the difference between rdtsc before and after

What's up with the “half fence” behavior of rdtscp?

蹲街弑〆低调 提交于 2020-05-26 04:42:10
问题 For many years x86 CPUs supported the rdtsc instruction, which reads the "time stamp counter" of the current CPU. The exact definition of this counter has changed over time, but on recent CPUs it is a counter that increments at a fixed frequency with respect to wall clock time, so it is very useful as building block for a fast, accurate clock or measuring the time taken by small segments of code. One important fact about the rdtsc instruction isn't ordered in any special way with the

What's up with the “half fence” behavior of rdtscp?

为君一笑 提交于 2020-05-26 04:41:23
问题 For many years x86 CPUs supported the rdtsc instruction, which reads the "time stamp counter" of the current CPU. The exact definition of this counter has changed over time, but on recent CPUs it is a counter that increments at a fixed frequency with respect to wall clock time, so it is very useful as building block for a fast, accurate clock or measuring the time taken by small segments of code. One important fact about the rdtsc instruction isn't ordered in any special way with the

Changing to “Unreal” mode, processor crash

邮差的信 提交于 2020-05-25 07:25:10
问题 I am writing a simple bootloader. Its main task is to load a kernel, and switch processor into unreal mode. My problem is when i turn on Unreal mode, the processor crashes. Here's my code (Some code used from MikeOS). I use NASM. BITS 16 jmp short bootloader_start ; Jump past disk description section nop ; Pad out before disk description ; ------------------------------------------------------------------ ; Disk description table, to make it a valid floppy ; Note: some of these values are

Changing to “Unreal” mode, processor crash

℡╲_俬逩灬. 提交于 2020-05-25 07:24:40
问题 I am writing a simple bootloader. Its main task is to load a kernel, and switch processor into unreal mode. My problem is when i turn on Unreal mode, the processor crashes. Here's my code (Some code used from MikeOS). I use NASM. BITS 16 jmp short bootloader_start ; Jump past disk description section nop ; Pad out before disk description ; ------------------------------------------------------------------ ; Disk description table, to make it a valid floppy ; Note: some of these values are

Why does the BIOS entry point start with a WBINVD instruction?

社会主义新天地 提交于 2020-05-25 06:26:47
问题 I'm investigating the BIOS code in my machine (x86_64 Linux, IvyBridge). I use the following procedure to dump the BIOS code: $ sudo cat /proc/iomem | grep ROM 000f0000-000fffff : System ROM $ sudo dd if=/dev/mem of=bios.dump bs=1M count=1 Then I use radare2 to read and disassemble the binary dump: $ r2 -b 16 bios.dump [0000:0000]> s 0xffff0 [f000:fff0]> pd 3 : f000:fff0 0f09 wbinvd `=< f000:fff2 e927f5 jmp 0xff51c f000:fff5 0000 add byte [bx + si], al I know x86 processor initialization

Does modern PC video hardware support VGA text mode in HW, or does the BIOS emulate it (with System Management Mode)?

可紊 提交于 2020-05-25 04:31:27
问题 What really happens on modern PC hardware booted in 16-bit legacy BIOS MBR mode when you store a byte such as '1' (0x31) into the VGA text (mode 03) framebuffer at physical linear address B8000 ? How slow is a mov [es:di], eax store with the MTRR for that region set to UC? (Experimental testing on one Kaby Lake iGPU laptop indicates that clflushopt on WC was roughly the same speed as UC for VGA memory. But without clflushopt, mov stores to WC memory never leave the CPU and don't update the

How do I read a file with ReadFile onto the stack in NASM x86 assembly?

 ̄綄美尐妖づ 提交于 2020-05-24 07:42:51
问题 I have opened a file with OpenFile, and gotten its size with GetFileSize. I wish to use ReadFile and use the stack as the buffer it requires, allocating enough room on the stack with the size of the file returned from GetFileSize. When I run this I get no output. Here is my code... extern GetStdHandle extern GetModuleFileNameA extern OpenFile extern ReadFile extern WriteFile extern CloseHandle extern GetFileSize extern ExitProcess import GetStdHandle kernel32.dll import GetModuleFileNameA