x86-64

Working inline assembly in C for bit parity?

纵饮孤独 提交于 2019-12-18 06:51:04
问题 I'm trying to compute the bit parity of a large number of uint64's. By bit parity I mean a function that accepts a uint64 and outputs 0 if the number of set bits is even, and 1 otherwise. Currently I'm using the following function (by @Troyseph, found here): uint parity64(uint64 n){ n ^= n >> 1; n ^= n >> 2; n = (n & 0x1111111111111111) * 0x1111111111111111; return (n >> 60) & 1; } The same SO page has the following assembly routine (by @papadp): .code ; bool CheckParity(size_t Result)

How to move two 32 bit registers in to one 64 bit?

佐手、 提交于 2019-12-18 05:54:43
问题 Let's say that I want to put two 32 bit registers EAX as low 32 bit word and EDX as high 32 bit word into RAX . I have found one way: shl rdx, 32 or rax, rdx This method works only if we are sure that bits from 32 to 61 of RAX are 0. If we are not sure about that, then we must first clear the high 32 bit word, like: mov eax, eax //This instruction should clear the high 32 bit word of RAX Is this the shortest way? Is there a single asm x86-64 instruction that does this operation? 回答1: Perhaps

C++ on x86-64: when are structs/classes passed and returned in registers?

孤街醉人 提交于 2019-12-18 05:49:06
问题 Assuming the x86-64 ABI on Linux, under what conditions in C++ are structs passed to functions in registers vs. on the stack? Under what conditions are they returned in registers? And does the answer change for classes? If it helps simplify the answer, you can assume a single argument/return value and no floating point values. 回答1: The ABI specification is defined here. A newer version is available here. I assume the reader is accustomed to the terminology of the document and that they can

Does x86_64 CPU use the same cache lines for communicate between 2 processes via shared memory?

做~自己de王妃 提交于 2019-12-18 05:20:57
问题 As known all levels of cache L1/L2/L3 on modern x86_64 are virtually indexed, physically tagged. And all cores communicate via Last Level Cache - cache-L3 by using cache coherent protocol MOESI/MESIF over QPI/HyperTransport. For example, Sandybridge family CPU has 4 - 16 way cache L3 and page_size 4KB, then this allows to exchange the data between concurrent processes which are executed on different cores via a shared memory. This is possible because cache L3 can't contain the same physical

Why is RCX not used for passing parameters to system calls, being replaced with R10? [duplicate]

隐身守侯 提交于 2019-12-18 05:07:28
问题 This question already has answers here : Linux x64: why does r10 come before r8 and r9 in syscalls? (2 answers) Closed 8 months ago . According to System V X86-64 ABI, function calls in the applications use the following sequence of registers to pass integer arguments: rdi, rsi, rdx, rcx, r8, r9 But system call arguments (other than syscall number) are passed in another sequence of registers: rdi, rsi, rdx, r10, r8, r9 Why does the kernel use r10 instead of rcx for the fourth argument? Is it

Why does printf print random value with float and integer format specifier

杀马特。学长 韩版系。学妹 提交于 2019-12-18 04:21:51
问题 I wrote a simple code on a 64 bit machine int main() { printf("%d", 2.443); } So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value. What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same

Why does printf print random value with float and integer format specifier

二次信任 提交于 2019-12-18 04:21:44
问题 I wrote a simple code on a 64 bit machine int main() { printf("%d", 2.443); } So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value. What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same

How Get arguments value using inline assembly in C without Glibc?

故事扮演 提交于 2019-12-17 21:11:39
问题 How Get arguments value using inline assembly in C without Glibc? i require this code for Linux archecture x86_64 and i386 . if you know about MAC OS X or Windows , also submit and please guide. void exit(int code) { //This function not important! //... } void _start() { //How Get arguments value using inline assembly //in C without Glibc? //argc //argv exit(0); } New Update https://gist.github.com/apsun/deccca33244471c1849d29cc6bb5c78e and #define ReadRdi(To) asm("movq %%rdi,%0" : "=r"(To));

VS: unexpected optimization behavior with _BitScanReverse64 intrinsic

半世苍凉 提交于 2019-12-17 20:25:24
问题 The following code works fine in debug mode, since _BitScanReverse64 is defined to return 0 if no Bit is set. Citing MSDN: (The return value is) "Nonzero if Index was set, or 0 if no set bits were found." If I compile this code in release mode it still works, but if I enable compiler optimizations, such as \O1 or \O2 the index is not zero and the assert() fails. #include <iostream> #include <cassert> using namespace std; int main() { unsigned long index = 0; _BitScanReverse64(&index, 0x0ull);

Opposite of cache prefetch hint

人盡茶涼 提交于 2019-12-17 20:10:30
问题 Is there a hint I can put in my code indicating that a line should be removed from cache? As opposed to a prefetch hint, which would indicate I will soon need a line. In my case, I know when I won't need a line for a while, so I want to be able to get rid of it to free up space for lines I do need. 回答1: clflush, clflushopt Invalidates from every level of the cache hierarchy in the cache coherence domain the cache line that contains the linear address specified with the memory operand. If that