x86-64 | 易学教程

Why does initialization of local static objects use hidden guard flags?

阅读更多关于 Why does initialization of local static objects use hidden guard flags?

问题 Local static objects in C++ are initialized once, the first time they are needed (which is relevant if the initialization has a side effect): void once() { static bool b = [] { std::cout << "hello" << std::endl; return true; } (); } once will print "hello" the first time it is called, but not if it is called again. I've put a few variations of this pattern into Compiler Explorer and noticed that all of the big-name implementations (GCC, Clang, ICC, VS) essentially do the same thing: a hidden

x86 instruction encoding how to choose opcode

阅读更多关于 x86 instruction encoding how to choose opcode

问题 When encode instruction cmpw %ax -5 for x86-64, from Intel-instruction-set-reference-manual, I have two opcodes to choose: 3D iw CMP AX, imm16 I Valid Valid Compare imm16 with AX. 83 /7 ib CMP r/m16, imm8 MI Valid Valid Compare imm8 with r/m16. So there will be two encoding results: 66 3d fb ff ; this for opcode 3d 66 83 f8 fb ; this for opcode 83 Then which one is better? I tried some online-disassembler below https://defuse.ca/online-x86-assembler.htm#disassembly2 https://onlinedisassembler

How to detect architecture in NASM at compile time to have one source code for both x64 and x86?

阅读更多关于 How to detect architecture in NASM at compile time to have one source code for both x64 and x86?

问题 I am looking for some preprocessor functionality in nasm that would allow having one source code for both x86 and x64 architectures. I mean something in the vein of ifdef some_constant. Like C preprocessor uses if it wants to detect say if it's compiled on Windows or Linux. Edit I know about nasm flags. I use them. I just want to have the very same source code and expect preprocessor to handle it correctly based on those flags. I'd use ifdef ... else for stack operations and so one, having

How to specify register constraints on the Intel x86_64 register r8 to r15 in GCC inline assembly?

阅读更多关于 How to specify register constraints on the Intel x86_64 register r8 to r15 in GCC inline assembly?

问题 Here's the list of register loading codes: a eax b ebx c ecx d edx S esi D edi I constant value (0 to 31) q,r dynamically allocated register (see below) g eax, ebx, ecx, edx or variable in memory A eax and edx combined into a 64-bit integer (use long longs) But this is register constraints for intel i386. My question is where I can find the register constraints of intel x86_64 system, like: ? %r10 ? %r8 ? %rdx and so on. 回答1: The machine specific constraints have a section in the gcc manual -

Performance of “conditional call” on amd64

阅读更多关于 Performance of “conditional call” on amd64

问题 When considering a conditional function call in a critical section of code I found that both gcc and clang will branch around the call. For example, for the following (admittedly trivial) code: int32_t __attribute__((noinline)) negate(int32_t num) { return -num; } int32_t f(int32_t num) { int32_t x = num < 0 ? negate(num) : num; return 2*x + 1; } Both GCC and clang compile to essentially the following: .global _f _f: cmp edi, 0 jg after_call call _negate after_call: lea rax, [rax*2+1] ret

Assembly and multicore CPUs

阅读更多关于 Assembly and multicore CPUs

What x86-64 instructions are used to enable/disable other cores/processors and how does one start executing code on them? Is there documentation somewhere on how this is done by the operating system? Pretty painful to get an x86 up and going... it is not so much in the cores as in the APIC system. You need to look into the docs for your chipset, tends to be pretty much hidden unfortunately. You will have to be at the kernel level, definitely. Looking at Linux sounds like a good idea. Assuming you're talking about implementing a kernel.... My understanding is it's largely based on this document

How to convert 32-bit compiled binary to 64-bit [closed]

阅读更多关于 How to convert 32-bit compiled binary to 64-bit [closed]

Background: We have acquired a software product that builds to a 32-bit Windows application in Visual Studio. We wish to port this application to 64-bit. A mission-critical component of this code is a black-box static library (.a file) originally built using gFortran by a third party. The original developer has since passed away, and the Fortran source we were able to get was incomplete and not the version this library was built off of (and contains critical bugs not present in the compiled library). They did not use a VCS. Problem: I would like to create a 64-bit static library whose code is

Speed up x64 assembler ADD loop

阅读更多关于 Speed up x64 assembler ADD loop

I'm working on arithmetic for multiplication of very long integers (some 100,000 decimal digits). As part of my library I to add two long numbers. Profiling shows that my code runs up to 25% of it's time in the add() and sub() routines, so it's important they are as fast as possible. But I don't see much potential, yet. Maybe you can give me some help, advice, insight or ideas. I'll test them and get back to you. So far my add routine does some setup and then uses a 8-times unrolled loop: mov rax, QWORD PTR [rdx+r11*8-64] mov r10, QWORD PTR [r8+r11*8-64] adc rax, r10 mov QWORD PTR [rcx+r11*8

How to check for GPU on CentOS Linux

阅读更多关于 How to check for GPU on CentOS Linux

It is suggested that on Linux, GPU be found with the command lspci | grep VGA . It works fine on Ubuntu but when I try to use the same on CentOS, it says lspci command is not found. How can I check for the GPU card on CentOS. And note that I'm not the administrator of the machine and I only use it remotely from command line. I intend to use the GPU as a GPGPU on that machine, but first I need to check if it even has one. Have you tried to launch /sbin/lspci or /usr/sbin/lspci ? This assumes you have proprietary drivers installed, but issue the following command... nvidia-smi The output should

Why is the performance of a running program getting better over time?

阅读更多关于 Why is the performance of a running program getting better over time?

Consider the following code: #include <iostream> #include <chrono> using Time = std::chrono::high_resolution_clock; using us = std::chrono::microseconds; int main() { volatile int i, k; const int n = 1000000; for(k = 0; k < 200; ++k) { auto begin = Time::now(); for (i = 0; i < n; ++i); // <-- auto end = Time::now(); auto dur = std::chrono::duration_cast<us>(end - begin).count(); std::cout << dur << std::endl; } return 0; } I am repeatedly measuring the execution time of the inner for loop . The results are shown in the following plot (y: duration, x: repetition): What is causing the decreasing