x86-64

Atomically clearing lowest non-zero bit of an unsigned integer

半腔热情 提交于 2019-11-29 10:38:23
Question: I'm looking for the best way to clear the lowest non-zero bit of a unsigned atomic like std::atomic_uint64_t in a threadsafe fashion without using an extra mutex or the like. In addition, I also need to know, which bit got cleared. Example: Lets say, if the current value stored is 0b0110 I want to know that the lowest non-zero bit is bit 1 (0-indexed) and set the variable to 0b0100 . The best version I came up with is this : #include <atomic> #include <cstdint> inline uint64_t with_lowest_non_zero_cleared(std::uint64_t v){ return v-1 & v; } inline uint64_t only_keep_lowest_non_zero

Why do we need stack allocation when we have a red zone?

限于喜欢 提交于 2019-11-29 10:37:04
I have the following doubts: As we know System V x86-64 ABI gives us about a fixed-size area (128 bytes) in the stack frame, so called redzone. So, as a result we don't need to use, for example, sub rsp, 12 . Just make mov [rsp-12], X and that's all. But I cannot grasp idea of that. Why does it matter? Is it necessary to sub rsp, 12 without redzone? After all, stack size is limited at the beginning so why sub rsp, 12 is important? I know that it makes possible us to follow the top of the stack but let's ignore it at that moment. I know what some instructions use rsp value ( like ret ) but don

How to check programmatically whether a managed assembly is x86, x64 or AnyCPU?

流过昼夜 提交于 2019-11-29 10:24:28
I need to determine programmatically whether an assembly is x86, x64 or AnyCPU? There is an almost identical question , but the solution that it provides Assembly assembly = Assembly.LoadFrom(fileName); PortableExecutableKinds peKind; ImageFileMachine imageFileMachine; assembly.ManifestModule.GetPEKind(out peKind, out imageFileMachine); fails when trying to load a 64-bit assembly from a 32-bit process (and vice versa). Is there a foolproof way of programmatically finding out the compilation type of an assembly? EDIT: Based on @BenVoigt suggestion, I created a small command line utility that

False Sharing and Atomic Variables

风格不统一 提交于 2019-11-29 10:16:44
When different variables are inside the same cache line, you can experience False Sharing , which means that even if two different threads (running on different cores) are accessing two different variables, if those two variables reside in the same cache line, you will have performance hit, as each time cache coherence will be triggered. Now say those variables are atomic variables (By atomic I mean variables which introduce a memory fence, such as the atomic<t> of C++), will false sharing matter there, or it does not matter if atomic variables are in the same cache line or not, as supposedly

C++ on x86-64: when are structs/classes passed and returned in registers?

倖福魔咒の 提交于 2019-11-29 10:08:33
Assuming the x86-64 ABI on Linux, under what conditions in C++ are structs passed to functions in registers vs. on the stack? Under what conditions are they returned in registers? And does the answer change for classes? If it helps simplify the answer, you can assume a single argument/return value and no floating point values. The ABI specification is defined here . A newer version is available here . I assume the reader is accustomed to the terminology of the document and that they can classify the primitive types. If the object size is larger than two eight-bytes, it is passed in memory:

cmpxchg example for 64 bit integer

a 夏天 提交于 2019-11-29 10:03:50
问题 I am using cmpxchg (compare-and-exchange) in i686 architecture for 32 bit compare and swap as follows. (Editor's note: the original 32-bit example was buggy, but the question isn't about it. I believe this version is safe, and as a bonus compiles correctly for x86-64 as well. Also note that inline asm isn't needed or recommended for this; __atomic_compare_exchange_n or the older __sync_bool_compare_and_swap work for int32_t or int64_t on i486 and x86-64 . But this question is about doing it

Why is this inline assembly not working with a separate asm volatile statement for each instruction?

半城伤御伤魂 提交于 2019-11-29 09:48:04
For the the following code: long buf[64]; register long rrax asm ("rax"); register long rrbx asm ("rbx"); register long rrsi asm ("rsi"); rrax = 0x34; rrbx = 0x39; __asm__ __volatile__ ("movq $buf,%rsi"); __asm__ __volatile__ ("movq %rax, 0(%rsi);"); __asm__ __volatile__ ("movq %rbx, 8(%rsi);"); printf( "buf[0] = %lx, buf[1] = %lx!\n", buf[0], buf[1] ); I get the following output: buf[0] = 0, buf[1] = 346161cbc0! while it should have been: buf[0] = 34, buf[1] = 39! Any ideas why it is not working properly, and how to solve it? You clobber memory but don't tell GCC about it, so GCC can cache

Assembly registers in 64-bit architecture

你。 提交于 2019-11-29 09:40:33
问题 Following the answer about assembly registers' sizes: First, what sizes are eax , ax , ah and their counterparts, in the 64-bit architecture? How to access a single register's byte and how to access all the 64-bit register's eight bytes? I'd love attention for both x86-64 (x64) and Itanium processors. Second, what is the correct way to use the four registers for holding the first four parameters in function calls in the new calling convention? 回答1: With the old name all registers remain the

Binary Bomb Phase 5

不想你离开。 提交于 2019-11-29 08:57:55
I have been working on a Binary Bomb for school, and I am absolutely lost in Phase 5. The object of the assignment is to dissemble the code and find a string, which I have found to be "flyers" and reverse engineer it to have the same numerical value as "flyers" does. However, I have spent the last 3-4 hours trying to find out how to do this? You don't have to give answers, but PLEASE help me understand what I need to do. Here is the disassembled code using gdb: Dump of assembler code for function phase_5: 0x08048d88 <+0>: push %ebx 0x08048d89 <+1>: sub $0x28,%esp 0x08048d8c <+4>: mov 0x30(%esp

Does x86_64 CPU use the same cache lines for communicate between 2 processes via shared memory?

不羁岁月 提交于 2019-11-29 08:46:05
As known all levels of cache L1/L2/L3 on modern x86_64 are virtually indexed, physically tagged . And all cores communicate via Last Level Cache - cache-L3 by using cache coherent protocol MOESI/MESIF over QPI/HyperTransport. For example, Sandybridge family CPU has 4 - 16 way cache L3 and page_size 4KB, then this allows to exchange the data between concurrent processes which are executed on different cores via a shared memory. This is possible because cache L3 can't contain the same physical memory area as a page of process 1 and as a page of process 2 at the same time. Does this mean that