x86-64 | 易学教程

When you have an AMD CPU, can you speed up code that uses the Intel-MKL?

阅读更多关于 When you have an AMD CPU, can you speed up code that uses the Intel-MKL?

问题 I have an AMD cpu and I'm trying to run some code that uses Intel-MKL. The code is significantly slower than I expected. When you have an AMD CPU, can you speed up code that uses the Intel-MKL? How? 回答1: Yes you can speed up your code. Set the environment variable MKL_DEBUG_CPU_TYPE=5 then run your code. FYI this slow down affects anything that uses Intel-MKL library and runs on AMD CPU (i.e. affects all operating systems and affects all programming languages and all programs (older versions

X86: What does `movsxd rdx,edx` instruction mean?

阅读更多关于 X86: What does `movsxd rdx,edx` instruction mean?

问题 I have been playing with intel mpx and found that it adds certain instructions that I could not understand. For e.g. (in intel format): movsxd rdx,edx I found this, which talks about a similar instruction - MOVSX . From that question, my interpretation of this instruction is that, it takes double byte (that's why there is a d in movsxd ) and it copies it into rdx register (in two least significant bytes) and fills the rest with the sign of that double byte. Is my interpretation correct (I

X86: What does `movsxd rdx,edx` instruction mean?

阅读更多关于 X86: What does `movsxd rdx,edx` instruction mean?

X86: What does `movsxd rdx,edx` instruction mean?

阅读更多关于 X86: What does `movsxd rdx,edx` instruction mean?

Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

阅读更多关于 Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

问题 I was messing around with optimizing a function using Google Benchmark, and ran into a situation where my code was unexpectedly slowing down in certain situations. I started experimenting with it, looking at the compiled assembly, and eventually came up with a minimal test case that exhibits the issue. Here's the assembly I came up with that exhibits this slowdown: .text test: #xorps %xmm0, %xmm0 cvtsi2ss %edi, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0 addss %xmm0, %xmm0

Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

阅读更多关于 Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

阅读更多关于 Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

Mixing C and Assembly. `Hello World` on 64-bit Linux

阅读更多关于 Mixing C and Assembly. `Hello World` on 64-bit Linux

问题 Based on this tutorial, I am trying to write Hello World to the console on 64 bit Linux. Compilation raises no errors, but I get no text on console either. I don't know what is wrong. write.s : .data SYSREAD = 0 SYSWRITE = 1 SYSEXIT = 60 STDOUT = 1 STDIN = 0 EXIT_SUCCESS = 0 message: .ascii "Hello, world!\n" message_len = .-message .text .globl _write _write: pushq %rbp movq %rsp, %rbp movq $SYSWRITE, %rax movq $STDOUT, %rdi movq $message, %rsi movq $message_len, %rdx syscall popq %rbp ret

Mixing C and Assembly. `Hello World` on 64-bit Linux

阅读更多关于 Mixing C and Assembly. `Hello World` on 64-bit Linux

Is there any situation where using MOVDQU and MOVUPD is better than MOVUPS?

阅读更多关于 Is there any situation where using MOVDQU and MOVUPD is better than MOVUPS?

问题 I was trying to understand the different MOV instructions for SSE on intel x86-64. According to this you should use aligned instructions (MOVAPS, MOVAPD and MOVDQA) when moving data between 2 registers, using the correct one for the type you're operating with. And use MOVUPS/MOVAPS when moving register to memory and vice-versa, since type does not impact performance when moving to/from memory. So is there any reason to use MOVDQU and MOVUPD ever? Is the explanation I got on the link wrong?