assembly | 易学教程

Does cmpxchg write destination cache line on failure? If not, is it better than xchg for spinlock?

阅读更多关于 Does cmpxchg write destination cache line on failure? If not, is it better than xchg for spinlock?

问题 I assume simple spinlock that does not go to OS waiting for the purposes of this question. I see that simple spinlock is often implemented using lock xchg or lock bts instead of lock cmpxchg . But doesn't cmpxchg avoid writing the value if the expectation does not match? So aren't failed attempts cheaper with cmpxchg ? Or does cmpxchg write data and invalidate cache line of other cores even on failure? This question is similar to What specifically marks an x86 cache line as dirty - any write,

Are Intel TSX prefixes executed (safely) on AMD as NOP?

阅读更多关于 Are Intel TSX prefixes executed (safely) on AMD as NOP?

问题 I have MASM synchronizing code for an application which runs on both Intel and AMD x86 machines. I'd like to enhance it using the Intel TSX prefixes, specifically XACQUIRE and XRELEASE. If I modify my code correctly for Intel, what will happen when I attempt to run it on AMD machines? Intel says that these were designed to be backwards compatible, presumably meaning they do nothing on Intel CPUs without TSX. I know that AMD has not implemented TSX. But are these prefixes safe to run on AMD

Are Intel TSX prefixes executed (safely) on AMD as NOP?

阅读更多关于 Are Intel TSX prefixes executed (safely) on AMD as NOP?

Are Intel TSX prefixes executed (safely) on AMD as NOP?

阅读更多关于 Are Intel TSX prefixes executed (safely) on AMD as NOP?

X86: What does `movsxd rdx,edx` instruction mean?

阅读更多关于 X86: What does `movsxd rdx,edx` instruction mean?

问题 I have been playing with intel mpx and found that it adds certain instructions that I could not understand. For e.g. (in intel format): movsxd rdx,edx I found this, which talks about a similar instruction - MOVSX . From that question, my interpretation of this instruction is that, it takes double byte (that's why there is a d in movsxd ) and it copies it into rdx register (in two least significant bytes) and fills the rest with the sign of that double byte. Is my interpretation correct (I

X86: What does `movsxd rdx,edx` instruction mean?

阅读更多关于 X86: What does `movsxd rdx,edx` instruction mean?

X86: What does `movsxd rdx,edx` instruction mean?

阅读更多关于 X86: What does `movsxd rdx,edx` instruction mean?

How to write a custom bootloader for mac systems?

阅读更多关于 How to write a custom bootloader for mac systems?

问题 I wrote a little bootloader in assembly and it uses BIOS interrupts and it works great on my pc. My question is, is there any possibility to make it work on Mac / Apple systems. I know that Apple doesn't use BIOS in that sense and that they are locking lot of things down. However it is possible to use a live ubuntu stick on mac, so might it be possible to run an assembly program from startup on a mac? If yes, do you know any starting points or references of what has been done before? Thanks a

Will C++ first gets converted to assembly [duplicate]

阅读更多关于 Will C++ first gets converted to assembly [duplicate]

问题 This question already has answers here : Does the C++ code compile to assembly codes? (3 answers) Does a compiler always produce an assembly code? (4 answers) Closed 7 years ago . I have confusion. I am C++ developer and heard many times that my source code will first gets converted to assembly and then assembly will get converted to machine code. But in one of the video tutorial of assembly language, instructor clearly said, C/C++ code directly gets convert to machine code. (Of course there

Do FP and integer division compete for the same throughput resources on x86 CPUs?

阅读更多关于 Do FP and integer division compete for the same throughput resources on x86 CPUs?

问题 We know that Intel CPUs do integer division and FP div / sqrt on a not-fully-pipelined divide execution unit on port 0. We know this from IACA output, other published stuff, and experimental testing. (e.g. https://agner.org/optimize/) But are there independent dividers for FP and integer (competing only for dispatch via port 0), or does interleaving two div-throughput-bound workloads make their cost add nearly linearly, if one is integer and the other is FP? This is complicated by Intel CPUs