instruction-set | 易学教程

How has CPU architecture evolution affected virtual function call performance?

阅读更多关于 How has CPU architecture evolution affected virtual function call performance?

Years ago I was learning about x86 assembler, CPU pipelining, cache misses, branch prediction, and all that jazz. It was a tale of two halves. I read about all the wonderful advantages of the lengthy pipelines in the processor viz instruction reordering, cache preloading, dependency interleaving, etc. The downside was that any deviation for the norm was enormously costly. For example, IIRC a certain AMD processor in the early-gigahertz era had a 40 cycle penalty every time you called a function through a pointer (!) and this was apparently normal. This is not a negligible "don't worry about it

What is the minimum instruction set required for any Assembly language to be considered useful?

阅读更多关于 What is the minimum instruction set required for any Assembly language to be considered useful?

I am studying Assembly programming in general, so I've decided to try and implement a "virtual microprocessor" in software, which has registers, flags and RAM to work with, implemented with variables and arrays. But since I want to simulate only the most basic behavior of any microprocessor , I want to create an assembly language that has only the essential instructions, only those instructions without which it couldn't be useful. I mean, there are assembly languages that can do multiplication and swapping register values, etc, but these operations are not basic because you can implement them

How is fma() implemented

阅读更多关于 How is fma() implemented

问题 According to the documentation, there is a fma() function in math.h . That is very nice, and I know how FMA works and what to use it for. However, I am not so certain how this is implemented in practice? I'm mostly interested in the x86 and x86_64 architectures. Is there a floating-point (non-vector) instruction for FMA, perhaps as defined by IEEE-754 2008? Is FMA3 or FMA4 instruction used? Is there an intrinsic to make sure that a real FMA is used, when the precision is relied upon? 回答1: The

x64 instruction encoding and the ModRM byte

阅读更多关于 x64 instruction encoding and the ModRM byte

The encoding of call qword ptr [rax] call qword ptr [rcx] is FF 10 FF 11 I can see where the last digit (0/1) comes from (the register number), but I'm trying to figure out where the second last digit (1) comes from. According to AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions page 56, "/digit - Indicates that the ModRM byte specifies only one register or memory (r/m) operand. The digit is specified by the ModRM reg field and is used as an instruction-opcode extension. Valid digit values range from 0 to 7." The equivalent Intel document says something

How does one do integer (signed or unsigned) division on ARM?

阅读更多关于 How does one do integer (signed or unsigned) division on ARM?

I'm working on Cortex-A8 and Cortex-A9 in particular. I know that some architectures don't come with integer division, but what is the best way to do it other than convert to float, divide, convert to integer? Or is that indeed the best solution? Cheers! = ) The compiler normally includes a divide in its library, gcclib for example I have extracted them from gcc and use them directly: https://github.com/dwelch67/stm32vld/ then stm32f4d/adventure/gcclib going to float and back is probably not the best solution. you can try it and see how fast it is...This is a multiply but could as easily make

What's the point of LEA EAX, [EAX]?

阅读更多关于 What's the point of LEA EAX, [EAX]?

LEA EAX, [EAX] I encountered this instruction in a binary compiled with the Microsoft C compiler. It clearly can't change the value of EAX. Then why is it there? codaddict It is a NOP . The following are typcially used as NOP . They all do the same thing but they result in machine code of different length. Depending on the alignment requirement one of them is chosen: xchg eax, eax = 90 mov eax, eax = 89 C0 lea eax, [eax + 0x00] = 8D 40 00 From this article: This trick is used by MSVC++ compiler to emit the NOP instructions of different length (for padding before jump targets). For example,

How to control which core a process runs on?

阅读更多关于 How to control which core a process runs on?

I can understand how one can write a program that uses multiple processes or threads: fork() a new process and use IPC, or create multiple threads and use those sorts of communication mechanisms. I also understand context switching. That is, with only once CPU, the operating system schedules time for each process (and there are tons of scheduling algorithms out there) and thereby we achieve running multiple processes simultaneously. And now that we have multi-core processors (or multi-processor computers), we could have two processes running simultaneously on two separate cores. My question is

MOVZX missing 32 bit register to 64 bit register

阅读更多关于 MOVZX missing 32 bit register to 64 bit register

Here's the instruction which copies (converts) unsigned registers: http://www.felixcloutier.com/x86/MOVZX.html Basically the instruction has 8->16, 8->32, 8->64, 16->32 and 16->64. Where's the 32->64 conversion? Do I have to use the signed version for that? If so how do you use the full 64 bits for an unsigned integer? Peter Cordes Short answer Use mov eax, edi to zero-extend EDI into RAX if you can't already guarantee that the high bits of RDI are all zero. See: Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register? Prefer using different source

How do I enable SSE for my freestanding bootable code?

阅读更多关于 How do I enable SSE for my freestanding bootable code?

(This question was originally about the CVTSI2SD instruction and the fact that I thought it didn't work on the Pentium M CPU, but in fact it's because I'm using a custom OS and I need to manually enable SSE.) I have a Pentium M CPU and a custom OS which so far used no SSE instructions, but I now need to use them. Trying to execute any SSE instruction results in an interruption 6, illegal opcode (which in Linux would cause a SIGILL , but this isn't Linux), also referred to in the Intel architectures software developer's manual (which I refer from now on as IASDM) as #UD - Invalid Opcode

Are ARM instructuons SWI and SVC exactly same thing?

阅读更多关于 Are ARM instructuons SWI and SVC exactly same thing?

问题 ARM assembly has SWI and SVC instructions for entering into 'supervisor mode'. What confuses me is, why there are two of them? Here it is said that SVC was formerly SWI. Does it mean that basically they changed the mnemonic? Are they the same thing? Can I use them interchangeably? Does one of them exist before an architecture, and other after? 回答1: Yes, SWI and SVC are same thing, it is just a name change. Previously, the SVC instruction was called SWI, Software Interrupt. The opcode for SVC