x86-64

When you have an AMD CPU, can you speed up code that uses the Intel-MKL?

僤鯓⒐⒋嵵緔 提交于 2020-08-08 05:16:31
问题 I have an AMD cpu and I'm trying to run some code that uses Intel-MKL. The code is significantly slower than I expected. When you have an AMD CPU, can you speed up code that uses the Intel-MKL? How? 回答1: Yes you can speed up your code. Set the environment variable MKL_DEBUG_CPU_TYPE=5 then run your code. FYI this slow down affects anything that uses Intel-MKL library and runs on AMD CPU (i.e. affects all operating systems and affects all programming languages and all programs (older versions

X86: What does `movsxd rdx,edx` instruction mean?

烂漫一生 提交于 2020-08-07 09:23:48
问题 I have been playing with intel mpx and found that it adds certain instructions that I could not understand. For e.g. (in intel format): movsxd rdx,edx I found this, which talks about a similar instruction - MOVSX . From that question, my interpretation of this instruction is that, it takes double byte (that's why there is a d in movsxd ) and it copies it into rdx register (in two least significant bytes) and fills the rest with the sign of that double byte. Is my interpretation correct (I

X86: What does `movsxd rdx,edx` instruction mean?

烈酒焚心 提交于 2020-08-07 09:18:49
问题 I have been playing with intel mpx and found that it adds certain instructions that I could not understand. For e.g. (in intel format): movsxd rdx,edx I found this, which talks about a similar instruction - MOVSX . From that question, my interpretation of this instruction is that, it takes double byte (that's why there is a d in movsxd ) and it copies it into rdx register (in two least significant bytes) and fills the rest with the sign of that double byte. Is my interpretation correct (I

X86: What does `movsxd rdx,edx` instruction mean?

谁说我不能喝 提交于 2020-08-07 09:17:32
问题 I have been playing with intel mpx and found that it adds certain instructions that I could not understand. For e.g. (in intel format): movsxd rdx,edx I found this, which talks about a similar instruction - MOVSX . From that question, my interpretation of this instruction is that, it takes double byte (that's why there is a d in movsxd ) and it copies it into rdx register (in two least significant bytes) and fills the rest with the sign of that double byte. Is my interpretation correct (I

Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

99封情书 提交于 2020-08-05 04:47:31
问题 I was messing around with optimizing a function using Google Benchmark, and ran into a situation where my code was unexpectedly slowing down in certain situations. I started experimenting with it, looking at the compiled assembly, and eventually came up with a minimal test case that exhibits the issue. Here's the assembly I came up with that exhibits this slowdown: .text test: #xorps %xmm0, %xmm0 cvtsi2ss %edi, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0

Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

若如初见. 提交于 2020-08-05 04:47:11
问题 I was messing around with optimizing a function using Google Benchmark, and ran into a situation where my code was unexpectedly slowing down in certain situations. I started experimenting with it, looking at the compiled assembly, and eventually came up with a minimal test case that exhibits the issue. Here's the assembly I came up with that exhibits this slowdown: .text test: #xorps %xmm0, %xmm0 cvtsi2ss %edi, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0

Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

爱⌒轻易说出口 提交于 2020-08-05 04:47:09
问题 I was messing around with optimizing a function using Google Benchmark, and ran into a situation where my code was unexpectedly slowing down in certain situations. I started experimenting with it, looking at the compiled assembly, and eventually came up with a minimal test case that exhibits the issue. Here's the assembly I came up with that exhibits this slowdown: .text test: #xorps %xmm0, %xmm0 cvtsi2ss %edi, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0

Mixing C and Assembly. `Hello World` on 64-bit Linux

﹥>﹥吖頭↗ 提交于 2020-07-31 04:16:54
问题 Based on this tutorial, I am trying to write Hello World to the console on 64 bit Linux. Compilation raises no errors, but I get no text on console either. I don't know what is wrong. write.s : .data SYSREAD = 0 SYSWRITE = 1 SYSEXIT = 60 STDOUT = 1 STDIN = 0 EXIT_SUCCESS = 0 message: .ascii "Hello, world!\n" message_len = .-message .text .globl _write _write: pushq %rbp movq %rsp, %rbp movq $SYSWRITE, %rax movq $STDOUT, %rdi movq $message, %rsi movq $message_len, %rdx syscall popq %rbp ret

Mixing C and Assembly. `Hello World` on 64-bit Linux

自作多情 提交于 2020-07-31 04:16:50
问题 Based on this tutorial, I am trying to write Hello World to the console on 64 bit Linux. Compilation raises no errors, but I get no text on console either. I don't know what is wrong. write.s : .data SYSREAD = 0 SYSWRITE = 1 SYSEXIT = 60 STDOUT = 1 STDIN = 0 EXIT_SUCCESS = 0 message: .ascii "Hello, world!\n" message_len = .-message .text .globl _write _write: pushq %rbp movq %rsp, %rbp movq $SYSWRITE, %rax movq $STDOUT, %rdi movq $message, %rsi movq $message_len, %rdx syscall popq %rbp ret

Is there any situation where using MOVDQU and MOVUPD is better than MOVUPS?

眉间皱痕 提交于 2020-07-29 12:08:44
问题 I was trying to understand the different MOV instructions for SSE on intel x86-64. According to this you should use aligned instructions (MOVAPS, MOVAPD and MOVDQA) when moving data between 2 registers, using the correct one for the type you're operating with. And use MOVUPS/MOVAPS when moving register to memory and vice-versa, since type does not impact performance when moving to/from memory. So is there any reason to use MOVDQU and MOVUPD ever? Is the explanation I got on the link wrong?