x86-64 | 易学教程

x86-64: canonical addresses and actual available range

阅读更多关于 x86-64: canonical addresses and actual available range

问题 Intel and AMD documentation says that for 64 bit mode only 48 bits are actually available for virtual addresses, and bits from 48 to 63 must replicate bit 47 (sign-extension). As far as I know, all current CPU are implemented this way, but nothing (in theory) forbids to extend the available space in future implementations (and this won't break the binary compatibility). Is there a standard way to programatically determine the number of meaningful bits? (i.e. some specific CPUID, as happens

Layout of structs in Linux/x86-64 syscalls for assembly programmers?

阅读更多关于 Layout of structs in Linux/x86-64 syscalls for assembly programmers?

问题 A number of linux/x86-64 syscalls accept pointers to structs as arguments. For example the second parameter of stat(2) is struct stat* ... struct stat { dev_t st_dev; /* ID of device containing file */ ino_t st_ino; /* inode number */ mode_t st_mode; /* protection */ nlink_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device ID (if special file) */ off_t st_size; /* total size, in bytes */ blksize_t st

How to check for GPU on CentOS Linux

阅读更多关于 How to check for GPU on CentOS Linux

问题 It is suggested that on Linux, GPU be found with the command lspci | grep VGA . It works fine on Ubuntu but when I try to use the same on CentOS, it says lspci command is not found. How can I check for the GPU card on CentOS. And note that I'm not the administrator of the machine and I only use it remotely from command line. I intend to use the GPU as a GPGPU on that machine, but first I need to check if it even has one. 回答1: Have you tried to launch /sbin/lspci or /usr/sbin/lspci ? 回答2: This

Why is the performance of a running program getting better over time?

阅读更多关于 Why is the performance of a running program getting better over time?

问题 Consider the following code: #include <iostream> #include <chrono> using Time = std::chrono::high_resolution_clock; using us = std::chrono::microseconds; int main() { volatile int i, k; const int n = 1000000; for(k = 0; k < 200; ++k) { auto begin = Time::now(); for (i = 0; i < n; ++i); // <-- auto end = Time::now(); auto dur = std::chrono::duration_cast<us>(end - begin).count(); std::cout << dur << std::endl; } return 0; } I am repeatedly measuring the execution time of the inner for loop .

Is there an 8-bit atomic CAS (cmpxchg) intrinsic for X64 in Visual C++?

阅读更多关于 Is there an 8-bit atomic CAS (cmpxchg) intrinsic for X64 in Visual C++?

问题 The following code is possible in 32-bit Visual Studio C++. Is there a 64-bit equivalent using intrinsics since inline ASM isn't supported in the 64-bit version of Visual Studio C++? FORCEINLINE bool bAtomicCAS8(volatile UINT8 *dest, UINT8 oldval, UINT8 newval) { bool result=false; __asm { mov al,oldval mov edx,dest mov cl,newval lock cmpxchg byte ptr [edx],cl setz result } return(result); } The following instrinsics compile under Visual Studio C++ _InterlockedCompareExchange16

Cost of a page fault trap

阅读更多关于 Cost of a page fault trap

问题 I have an application which periodically (after each 1 or 2 seconds) takes checkpoints by forking itself. So checkpoint is a fork of the original process which just stays idle until it is asked to start when some error in the original process occurs. Now my question is how costly is the copy-on-write mechanism of fork. How much is the cost of a page fault trap that will occur whenever the original process writes to a memory page (first time after taking a checkpoint that is), as copy-on-write

The difference between cmpl and cmp

阅读更多关于 The difference between cmpl and cmp

问题 I am trying to understand assembly to be able to solve a puzzle. However I encountered the following instructions: 0x0000000000401136 <+44>: cmpl $0x7,0x14(%rsp) 0x000000000040113b <+49>: ja 0x401230 <phase_3+294> What I think its doing is: The value of 0x14(%rsp) is -7380. According to my understanding cmpl compares unsigned. Also the jump is performed. So can it be that (unsigned)-7380 > 7 (unsigned)7380 > 7--> jump I actually don't want it to jump. But is this the correct explanation or

What is the difference between retq and ret?

阅读更多关于 What is the difference between retq and ret?

问题 Let's consider the following program, which computes an unsigned square of the argument: .global foo .text foo: mov %rdi, %rax mul %rdi ret This is properly compiled by as , but disassembles to 0000000000000000 <foo>: 0: 48 89 f8 mov %rdi,%rax 3: 48 f7 e7 mul %rdi 6: c3 retq Is there any difference between ret and retq ? 回答1: In long (64-bit) mode, you return ( ret ) by popping a quadword address from the stack to %rip . In 32-bit mode, you return ( ret ) by popping a dword address from the

Evaluating SMI (System Management Interrupt) latency on Linux-CentOS/Intel machine

阅读更多关于 Evaluating SMI (System Management Interrupt) latency on Linux-CentOS/Intel machine

问题 I am interested in evaluating the behavior (latency, frequency) of SMI handling on Linux machine running CentOS and used for a (very) soft real time application. What tools are recommended (hwlatdetect for CentOS?), and what is the best course of action to go about this? If no good tools are available for CentOS, am I correct to assume that installing a different OS on the same machine should yield the same results since the underlying hardware/bios are the same? Is there any source for

How fast is an atomic/interlocked variable compared to a lock, with or without contention? [duplicate]

阅读更多关于 How fast is an atomic/interlocked variable compared to a lock, with or without contention? [duplicate]

问题 This question already has answers here : Overhead of using locks instead of atomic intrinsics (4 answers) Closed 8 months ago . And how much faster/slower it is as compared to an uncontested atomic variable (such as std::atomic<T> of C++) operation. Also, how much slower are contested atomic variables relative to the uncontested lock? The architecture I'm working on is x86-64. 回答1: There’s a project on GitHub with the purpose of measuring this on different platforms. Unfortunately, after my