instruction-set | 易学教程

How does one do integer (signed or unsigned) division on ARM?

阅读更多关于 How does one do integer (signed or unsigned) division on ARM?

问题 I'm working on Cortex-A8 and Cortex-A9 in particular. I know that some architectures don't come with integer division, but what is the best way to do it other than convert to float, divide, convert to integer? Or is that indeed the best solution? Cheers! = ) 回答1: The compiler normally includes a divide in its library, gcclib for example I have extracted them from gcc and use them directly: https://github.com/dwelch67/stm32vld/ then stm32f4d/adventure/gcclib going to float and back is probably

How to check if a CPU supports the SSE3 instruction set?

阅读更多关于 How to check if a CPU supports the SSE3 instruction set?

问题 Is the following code valid to check if a CPU supports the SSE3 instruction set? Using the IsProcessorFeaturePresent() function apparently does not work on Windows XP (see http://msdn.microsoft.com/en-us/library/ms724482(v=vs.85).aspx). bool CheckSSE3() { int CPUInfo[4] = {-1}; //-- Get number of valid info ids __cpuid(CPUInfo, 0); int nIds = CPUInfo[0]; //-- Get info for id "1" if (nIds >= 1) { __cpuid(CPUInfo, 1); bool bSSE3NewInstructions = (CPUInfo[2] & 0x1) || false; return

What is the difference between MOV and LEA?

阅读更多关于 What is the difference between MOV and LEA?

问题 I would like to know what the difference between these instructions is: MOV AX, [TABLE-ADDR] and LEA AX, [TABLE-ADDR] 回答1: LEA means Load Effective Address MOV means Load Value In short, LEA loads a pointer to the item you're addressing whereas MOV loads the actual value at that address. The purpose of LEA is to allow one to perform a non-trivial address calculation and store the result [for later usage] LEA ax, [BP+SI+5] ; Compute address of value MOV ax, [BP+SI+5] ; Load value at that

Direct Arithmetic Operations on Small-sized Numbers in RISC Architectures

阅读更多关于 Direct Arithmetic Operations on Small-sized Numbers in RISC Architectures

问题 Are there any RISC architectures which allow arithmetic operations to be applied individually to bytes, half-words and other data cells, whose size is less than the size of the CPU general purpose registers? In Intel x86 (IA-32) and x86-64 (known as EM64T or AMD64) processors not only the whole register is available, but its smaller parts are operable as well. Intel ISA allows to perform all the arithmetic operations on the whole register, it's half, quarter and a byte (to be more precise,

Why does the FMA _mm256_fmadd_pd() intrinsic have 3 asm mnemonics, “vfmadd132pd”, “231” and “213”?

阅读更多关于 Why does the FMA _mm256_fmadd_pd() intrinsic have 3 asm mnemonics, “vfmadd132pd”, “231” and “213”?

问题 Could someone explain to me why there are 3 variants of the fused multiply-accumulate instruction: vfmadd132pd , vfmadd231pd and vfmadd213pd , while there is only one C intrinsics _mm256_fmadd_pd ? To make things simple, what is the difference between (in AT&T syntax) vfmadd132pd %ymm0, %ymm1, %ymm2 vfmadd231pd %ymm0, %ymm1, %ymm2 vfmadd213pd %ymm0, %ymm1, %ymm2 I did not get any idea from Intel's intrinsics guide. I ask because I see all of them in the assembler output of a chunk of C code I

What EXACTLY is the difference between intel's and amd's ISA, if any?

阅读更多关于 What EXACTLY is the difference between intel's and amd's ISA, if any?

问题 I know people have asked similar questions like this before, however there is so much conflicting information that I really want to try and clear it up once and for all. I will attempt to do so by clearly distinguishing between instruction set architecture (ISA) and actual hardware implementation. First my attempted clarifications: 1.) Currently there are intel64 and amd64 CPU's out there (among others but these are the focus) 2.) Given that an ISA is the binary representation of 1 or more

JVM instruction set CPU cycles & byte size

阅读更多关于 JVM instruction set CPU cycles & byte size

问题 The Java Virtual Machine Instruction Set page provides information about mnemonics such as aaload, aastore... etc. However neither the cpu cycles that these mnemonics would take up is mentioned nor is there any info on the byte size or word size of these mnemonics. Any idea where this information could be found? 回答1: The spec is what a JVM needs to implement not how it does so. Different platforms and different JVM's from venders such as IBM and Sun will use different implementations so you

What is the difference between system calls and instruction set

阅读更多关于 What is the difference between system calls and instruction set

问题 Iam confused whether system calls and instruction set are synonymous? Do the instructions like MOV, LOAD, CALL, IN, OUT , ADD, SUB etc fall in the category of system calls? System call instructions like open(), close(), read(), write(). If not then what is the relationship between them. Can someone please explain and clear the confusion. 回答1: Several books are needed to explain the difference. I recommend notably Operating Systems : Three Easy Pieces and some book on computer architecture, or

Why does ARM distinguish between SDIV and UDIV but not with ADD, SUB and MUL?

阅读更多关于 Why does ARM distinguish between SDIV and UDIV but not with ADD, SUB and MUL?

问题 As stated in the title, why does the ARM instruction set distinguish between signed and unsigned only on division? SDIV and UDIV are available but that's not the case with ADD, SUB and MUL. 回答1: addition and subtraction of signed and unsigned numbers of the same size produce exactly the same bit patterns in two's complement math (which ARM uses), so there is no neeed for separate instructions. for example if we take byte-sized values: 0xFC +4 signed: -4+4 = 0 unsigned: 252 +4 = 256 = 0x100 =

Why are there two ways to multiply arbitrary signed numbers in MIPS?

阅读更多关于 Why are there two ways to multiply arbitrary signed numbers in MIPS?

问题 If you need to multiply two arbitrary signed numbers in MIPS, is there a reason to prefer: mul $t0 $s0 $s1 Or this: mult $s0 $s1 mflo $t0 ? I'm finding inconsistent answers online with regard to what each one means. At first glance I would expect the former to be a pseudo-instruction for the latter. (And there's even a web page that claims that.) But looking at the machine code it appears that mult is a valid R-type instruction (opcode 0) whereas mul has a nonzero opcode (0x1c) and so shouldn