instruction-set

How has CPU architecture evolution affected virtual function call performance?

柔情痞子 提交于 2019-11-28 05:48:29
Years ago I was learning about x86 assembler, CPU pipelining, cache misses, branch prediction, and all that jazz. It was a tale of two halves. I read about all the wonderful advantages of the lengthy pipelines in the processor viz instruction reordering, cache preloading, dependency interleaving, etc. The downside was that any deviation for the norm was enormously costly. For example, IIRC a certain AMD processor in the early-gigahertz era had a 40 cycle penalty every time you called a function through a pointer (!) and this was apparently normal. This is not a negligible "don't worry about it

What is the minimum instruction set required for any Assembly language to be considered useful?

谁都会走 提交于 2019-11-28 05:01:01
I am studying Assembly programming in general, so I've decided to try and implement a "virtual microprocessor" in software, which has registers, flags and RAM to work with, implemented with variables and arrays. But since I want to simulate only the most basic behavior of any microprocessor , I want to create an assembly language that has only the essential instructions, only those instructions without which it couldn't be useful. I mean, there are assembly languages that can do multiplication and swapping register values, etc, but these operations are not basic because you can implement them

How is fma() implemented

﹥>﹥吖頭↗ 提交于 2019-11-28 01:28:15
问题 According to the documentation, there is a fma() function in math.h . That is very nice, and I know how FMA works and what to use it for. However, I am not so certain how this is implemented in practice? I'm mostly interested in the x86 and x86_64 architectures. Is there a floating-point (non-vector) instruction for FMA, perhaps as defined by IEEE-754 2008? Is FMA3 or FMA4 instruction used? Is there an intrinsic to make sure that a real FMA is used, when the precision is relied upon? 回答1: The

x64 instruction encoding and the ModRM byte

China☆狼群 提交于 2019-11-28 01:14:45
The encoding of call qword ptr [rax] call qword ptr [rcx] is FF 10 FF 11 I can see where the last digit (0/1) comes from (the register number), but I'm trying to figure out where the second last digit (1) comes from. According to AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions page 56, "/digit - Indicates that the ModRM byte specifies only one register or memory (r/m) operand. The digit is specified by the ModRM reg field and is used as an instruction-opcode extension. Valid digit values range from 0 to 7." The equivalent Intel document says something

How does one do integer (signed or unsigned) division on ARM?

北慕城南 提交于 2019-11-27 22:05:24
I'm working on Cortex-A8 and Cortex-A9 in particular. I know that some architectures don't come with integer division, but what is the best way to do it other than convert to float, divide, convert to integer? Or is that indeed the best solution? Cheers! = ) The compiler normally includes a divide in its library, gcclib for example I have extracted them from gcc and use them directly: https://github.com/dwelch67/stm32vld/ then stm32f4d/adventure/gcclib going to float and back is probably not the best solution. you can try it and see how fast it is...This is a multiply but could as easily make

What's the point of LEA EAX, [EAX]?

柔情痞子 提交于 2019-11-27 17:23:42
LEA EAX, [EAX] I encountered this instruction in a binary compiled with the Microsoft C compiler. It clearly can't change the value of EAX. Then why is it there? codaddict It is a NOP . The following are typcially used as NOP . They all do the same thing but they result in machine code of different length. Depending on the alignment requirement one of them is chosen: xchg eax, eax = 90 mov eax, eax = 89 C0 lea eax, [eax + 0x00] = 8D 40 00 From this article: This trick is used by MSVC++ compiler to emit the NOP instructions of different length (for padding before jump targets). For example,

How to control which core a process runs on?

爷,独闯天下 提交于 2019-11-27 17:23:13
I can understand how one can write a program that uses multiple processes or threads: fork() a new process and use IPC, or create multiple threads and use those sorts of communication mechanisms. I also understand context switching. That is, with only once CPU, the operating system schedules time for each process (and there are tons of scheduling algorithms out there) and thereby we achieve running multiple processes simultaneously. And now that we have multi-core processors (or multi-processor computers), we could have two processes running simultaneously on two separate cores. My question is

MOVZX missing 32 bit register to 64 bit register

China☆狼群 提交于 2019-11-27 09:33:39
Here's the instruction which copies (converts) unsigned registers: http://www.felixcloutier.com/x86/MOVZX.html Basically the instruction has 8->16, 8->32, 8->64, 16->32 and 16->64. Where's the 32->64 conversion? Do I have to use the signed version for that? If so how do you use the full 64 bits for an unsigned integer? Peter Cordes Short answer Use mov eax, edi to zero-extend EDI into RAX if you can't already guarantee that the high bits of RDI are all zero. See: Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register? Prefer using different source

How do I enable SSE for my freestanding bootable code?

吃可爱长大的小学妹 提交于 2019-11-27 09:31:28
(This question was originally about the CVTSI2SD instruction and the fact that I thought it didn't work on the Pentium M CPU, but in fact it's because I'm using a custom OS and I need to manually enable SSE.) I have a Pentium M CPU and a custom OS which so far used no SSE instructions, but I now need to use them. Trying to execute any SSE instruction results in an interruption 6, illegal opcode (which in Linux would cause a SIGILL , but this isn't Linux), also referred to in the Intel architectures software developer's manual (which I refer from now on as IASDM) as #UD - Invalid Opcode

Are ARM instructuons SWI and SVC exactly same thing?

♀尐吖头ヾ 提交于 2019-11-27 06:38:06
问题 ARM assembly has SWI and SVC instructions for entering into 'supervisor mode'. What confuses me is, why there are two of them? Here it is said that SVC was formerly SWI. Does it mean that basically they changed the mnemonic? Are they the same thing? Can I use them interchangeably? Does one of them exist before an architecture, and other after? 回答1: Yes, SWI and SVC are same thing, it is just a name change. Previously, the SVC instruction was called SWI, Software Interrupt. The opcode for SVC