assembly

Fastest Offset Read for a Small Array

ぃ、小莉子 提交于 2020-08-03 05:48:43
问题 For speed, I would like to read one of 8 registers referenced by the value in a 9th register. The fastest way I see to do this is to use 3 conditional jumps (checking 3 bits in the 9th register). This should have shorter latency than the standard way of doing this with an offset memory read, but this still requires at least 6 clock cycles (at least one test plus one conditional jmp per bit check). Is there any commercial CPU (preferably x86/x64) with an intrinsic to do this "offset register

What is instruction fusion in contemporary x86 processors?

孤街浪徒 提交于 2020-08-02 08:40:06
问题 What I understand is, there are two types of instruction fusions: Micro-operation fusion Macro-operation fusion Micro-operations are those operations that can be executed in 1 clock cycle. If several micro-operations are fused, we obtain an "instruction". If several instructions are fused, we obtain a Macro-operation. If several macro-operations are fused, we obtain Macro-operation fusing. Am I correct? 回答1: No, fusion is totally separate from how one complex instruction (like cpuid or lock

can't compare user input with number, nasm elf64

笑着哭i 提交于 2020-07-31 04:30:03
问题 I swear I've read more than 20 pages today, from NASM's manual to Universities' guides to Wikipedia to everything in between but I just can't wrap my head around this, I wrote a single program to compare the user input with either a 0 or a 1 and then act based on that (I should probably use an array once I get the hang of them in Assembly), but this will do for now. Problem is, my checks never work , they always go straight to the err label, I looked at x86 NASM Assembly - Problems with Input

can't compare user input with number, nasm elf64

核能气质少年 提交于 2020-07-31 04:29:47
问题 I swear I've read more than 20 pages today, from NASM's manual to Universities' guides to Wikipedia to everything in between but I just can't wrap my head around this, I wrote a single program to compare the user input with either a 0 or a 1 and then act based on that (I should probably use an array once I get the hang of them in Assembly), but this will do for now. Problem is, my checks never work , they always go straight to the err label, I looked at x86 NASM Assembly - Problems with Input

Mixing C and Assembly. `Hello World` on 64-bit Linux

﹥>﹥吖頭↗ 提交于 2020-07-31 04:16:54
问题 Based on this tutorial, I am trying to write Hello World to the console on 64 bit Linux. Compilation raises no errors, but I get no text on console either. I don't know what is wrong. write.s : .data SYSREAD = 0 SYSWRITE = 1 SYSEXIT = 60 STDOUT = 1 STDIN = 0 EXIT_SUCCESS = 0 message: .ascii "Hello, world!\n" message_len = .-message .text .globl _write _write: pushq %rbp movq %rsp, %rbp movq $SYSWRITE, %rax movq $STDOUT, %rdi movq $message, %rsi movq $message_len, %rdx syscall popq %rbp ret

Mixing C and Assembly. `Hello World` on 64-bit Linux

自作多情 提交于 2020-07-31 04:16:50
问题 Based on this tutorial, I am trying to write Hello World to the console on 64 bit Linux. Compilation raises no errors, but I get no text on console either. I don't know what is wrong. write.s : .data SYSREAD = 0 SYSWRITE = 1 SYSEXIT = 60 STDOUT = 1 STDIN = 0 EXIT_SUCCESS = 0 message: .ascii "Hello, world!\n" message_len = .-message .text .globl _write _write: pushq %rbp movq %rsp, %rbp movq $SYSWRITE, %rax movq $STDOUT, %rdi movq $message, %rsi movq $message_len, %rdx syscall popq %rbp ret

Alternative to mul/mult for multiplication in assembly (MIPS)?

不羁岁月 提交于 2020-07-30 07:41:39
问题 I'm implementing a simple single-cycle MIPS processor for a class, and the only operations we implemented are lw , sw , j , addi , or , and , add , sub , beq , slt , jr , andi , jal , bne and sll . I have to write a MIPS file testing a factorial function. Obviously, I can't use instructions that haven't been implemented but since factorial means: result = n * factorial(n-1) , I need a way to multiply two values. Is there a way to do that with the instructions mentioned earlier? EDIT: I got it

What is the point of SSE2 instructions such as orpd?

橙三吉。 提交于 2020-07-30 06:04:50
问题 The orpd instruction is a "bitwise logical OR of packed double precision floating point values". Doesn't this do exactly the same thing as por ("bitwise logical OR")? If so, what's the point of having it? 回答1: Remember that SSE1 orps came first. (Well actually MMX por mm, mm/mem came even before SSE1.) Having the same opcode with a new prefix be the SSE2 orpd instruction makes sense for hardware decoder logic, I guess, just like movapd vs. movaps . Several instructions like this are redundant

What is the point of SSE2 instructions such as orpd?

我只是一个虾纸丫 提交于 2020-07-30 06:04:04
问题 The orpd instruction is a "bitwise logical OR of packed double precision floating point values". Doesn't this do exactly the same thing as por ("bitwise logical OR")? If so, what's the point of having it? 回答1: Remember that SSE1 orps came first. (Well actually MMX por mm, mm/mem came even before SSE1.) Having the same opcode with a new prefix be the SSE2 orpd instruction makes sense for hardware decoder logic, I guess, just like movapd vs. movaps . Several instructions like this are redundant

Is there any situation where using MOVDQU and MOVUPD is better than MOVUPS?

眉间皱痕 提交于 2020-07-29 12:08:44
问题 I was trying to understand the different MOV instructions for SSE on intel x86-64. According to this you should use aligned instructions (MOVAPS, MOVAPD and MOVDQA) when moving data between 2 registers, using the correct one for the type you're operating with. And use MOVUPS/MOVAPS when moving register to memory and vice-versa, since type does not impact performance when moving to/from memory. So is there any reason to use MOVDQU and MOVUPD ever? Is the explanation I got on the link wrong?