x86-64

x86-64: canonical addresses and actual available range

霸气de小男生 提交于 2019-12-21 17:04:03
问题 Intel and AMD documentation says that for 64 bit mode only 48 bits are actually available for virtual addresses, and bits from 48 to 63 must replicate bit 47 (sign-extension). As far as I know, all current CPU are implemented this way, but nothing (in theory) forbids to extend the available space in future implementations (and this won't break the binary compatibility). Is there a standard way to programatically determine the number of meaningful bits? (i.e. some specific CPUID, as happens

Layout of structs in Linux/x86-64 syscalls for assembly programmers?

放肆的年华 提交于 2019-12-21 09:18:45
问题 A number of linux/x86-64 syscalls accept pointers to structs as arguments. For example the second parameter of stat(2) is struct stat* ... struct stat { dev_t st_dev; /* ID of device containing file */ ino_t st_ino; /* inode number */ mode_t st_mode; /* protection */ nlink_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device ID (if special file) */ off_t st_size; /* total size, in bytes */ blksize_t st

How to check for GPU on CentOS Linux

陌路散爱 提交于 2019-12-21 07:55:16
问题 It is suggested that on Linux, GPU be found with the command lspci | grep VGA . It works fine on Ubuntu but when I try to use the same on CentOS, it says lspci command is not found. How can I check for the GPU card on CentOS. And note that I'm not the administrator of the machine and I only use it remotely from command line. I intend to use the GPU as a GPGPU on that machine, but first I need to check if it even has one. 回答1: Have you tried to launch /sbin/lspci or /usr/sbin/lspci ? 回答2: This

Why is the performance of a running program getting better over time?

放肆的年华 提交于 2019-12-21 07:28:14
问题 Consider the following code: #include <iostream> #include <chrono> using Time = std::chrono::high_resolution_clock; using us = std::chrono::microseconds; int main() { volatile int i, k; const int n = 1000000; for(k = 0; k < 200; ++k) { auto begin = Time::now(); for (i = 0; i < n; ++i); // <-- auto end = Time::now(); auto dur = std::chrono::duration_cast<us>(end - begin).count(); std::cout << dur << std::endl; } return 0; } I am repeatedly measuring the execution time of the inner for loop .

Is there an 8-bit atomic CAS (cmpxchg) intrinsic for X64 in Visual C++?

Deadly 提交于 2019-12-21 04:49:28
问题 The following code is possible in 32-bit Visual Studio C++. Is there a 64-bit equivalent using intrinsics since inline ASM isn't supported in the 64-bit version of Visual Studio C++? FORCEINLINE bool bAtomicCAS8(volatile UINT8 *dest, UINT8 oldval, UINT8 newval) { bool result=false; __asm { mov al,oldval mov edx,dest mov cl,newval lock cmpxchg byte ptr [edx],cl setz result } return(result); } The following instrinsics compile under Visual Studio C++ _InterlockedCompareExchange16

Cost of a page fault trap

风流意气都作罢 提交于 2019-12-21 04:12:35
问题 I have an application which periodically (after each 1 or 2 seconds) takes checkpoints by forking itself. So checkpoint is a fork of the original process which just stays idle until it is asked to start when some error in the original process occurs. Now my question is how costly is the copy-on-write mechanism of fork. How much is the cost of a page fault trap that will occur whenever the original process writes to a memory page (first time after taking a checkpoint that is), as copy-on-write

The difference between cmpl and cmp

时光毁灭记忆、已成空白 提交于 2019-12-21 03:53:21
问题 I am trying to understand assembly to be able to solve a puzzle. However I encountered the following instructions: 0x0000000000401136 <+44>: cmpl $0x7,0x14(%rsp) 0x000000000040113b <+49>: ja 0x401230 <phase_3+294> What I think its doing is: The value of 0x14(%rsp) is -7380. According to my understanding cmpl compares unsigned. Also the jump is performed. So can it be that (unsigned)-7380 > 7 (unsigned)7380 > 7--> jump I actually don't want it to jump. But is this the correct explanation or

What is the difference between retq and ret?

房东的猫 提交于 2019-12-21 03:26:17
问题 Let's consider the following program, which computes an unsigned square of the argument: .global foo .text foo: mov %rdi, %rax mul %rdi ret This is properly compiled by as , but disassembles to 0000000000000000 <foo>: 0: 48 89 f8 mov %rdi,%rax 3: 48 f7 e7 mul %rdi 6: c3 retq Is there any difference between ret and retq ? 回答1: In long (64-bit) mode, you return ( ret ) by popping a quadword address from the stack to %rip . In 32-bit mode, you return ( ret ) by popping a dword address from the

Evaluating SMI (System Management Interrupt) latency on Linux-CentOS/Intel machine

家住魔仙堡 提交于 2019-12-20 14:09:36
问题 I am interested in evaluating the behavior (latency, frequency) of SMI handling on Linux machine running CentOS and used for a (very) soft real time application. What tools are recommended (hwlatdetect for CentOS?), and what is the best course of action to go about this? If no good tools are available for CentOS, am I correct to assume that installing a different OS on the same machine should yield the same results since the underlying hardware/bios are the same? Is there any source for

How fast is an atomic/interlocked variable compared to a lock, with or without contention? [duplicate]

旧城冷巷雨未停 提交于 2019-12-20 12:36:24
问题 This question already has answers here : Overhead of using locks instead of atomic intrinsics (4 answers) Closed 8 months ago . And how much faster/slower it is as compared to an uncontested atomic variable (such as std::atomic<T> of C++) operation. Also, how much slower are contested atomic variables relative to the uncontested lock? The architecture I'm working on is x86-64. 回答1: There’s a project on GitHub with the purpose of measuring this on different platforms. Unfortunately, after my