assembly | 易学教程

When to use a certain calling convention

阅读更多关于 When to use a certain calling convention

问题 Are there any guidelines in x86-64 for when a function should abide by the System V guidelines and when it doesn't matter? This is in response to an answer here which mentions using other calling conventions for simplifying an internal/local function. # gcc 32-bit regparm calling convention is_even: # input in RAX, bool return value in AL not %eax # 2 bytes and $1, %al # 2 bytes ret # custom calling convention: is_even: # input in RDI # returns in ZF. ZF=1 means even test $1, %dil # 4 bytes.

Is TLB inclusive?

阅读更多关于 Is TLB inclusive?

问题 Is TLB hierarchy inclusive on modern x86 CPU (e.g. Skylake, or maybe other Lakes)? For example, prefetchtn brings data to the level cache n + 1 as well as a corresponding TLB entry in DTLB. Will it be contained in the STLB as well? 回答1: AFAIK, on Intel SnB-family 2nd-level TLB is a victim cache for first-level iTLB and dTLB. (I can't find a source for this and IDK where I read it originally. So take this with a grain of salt . I had originally thought this was a well-known fact, but it might

Instructions to copy the low byte from an int to a char: Simpler to just do a byte load?

阅读更多关于 Instructions to copy the low byte from an int to a char: Simpler to just do a byte load?

问题 I was reading a text book and it has an exercise that write x86-64 assembly code based on C code //Assume that the values of sp and dp are stored in registers %rdi and %rsi int *sp; char *dp; *dp = (char) *sp; and the answer is: //first approach movl (%rdi), %eax //Read 4 bytes movb %al, (%rsi) //Store low-order byte I can understand it but just wondering can't we do sth simple in the first place as: //second approach movb (%rdi), %al //Read one bytes only rather than read all four bytes movb

Why is the variable name “name” not allowed in assembly 8086?

阅读更多关于 Why is the variable name “name” not allowed in assembly 8086?

问题 When I try to declare a variable with the name "name" it doesn't work, it gives me an error, this one there are errors. with the following explanation (22) wrong parameters: MOV BL, name (22) probably no zero prefix for hex; or no 'h' suffix; or wrong addressing; or undefined var: name here is my code ; multi-segment executable file template. data segment ; add your data here! pkey db "press any key...$" name db "myname" ends stack segment dw 128 dup(0) ends code segment start: ; set segment

For temporary registers in the asm statement, should I use clobber or dummy output?

阅读更多关于 For temporary registers in the asm statement, should I use clobber or dummy output?

问题 As mentioned in the title of this question, when I modify some registers inside the asm statement, for a temporary reason, which option is better in between the clobber and dummy output? For example, I implemented two versions of the exchange function in the link, and found that two versions generate the same amount of output instructions. Which version should I use? Should I use the one with the dummy output to allow the compiler choose the register that may optimize entire function as much

How does imul and idiv really work 8086?

阅读更多关于 How does imul and idiv really work 8086?

问题 I am trying to figure out how the imul and idiv instructions of the 8086 microprocessor work. I know this: 1. mul and div are multiplications and division for unsigned numbers 2. imul and idiv, are also multiplications and divisions but for signed numbers I searched all the web, and what I just wrote above, that's the only info that I've found, but written in different ways. I have this: mov AX, 0FFCEh idiv AH Because ah it's a byte, AL=AX/AH (the result) and AH=remainder After the

How to simulate pcmpgtq on sse2?

阅读更多关于 How to simulate pcmpgtq on sse2?

问题 PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4.2? Update: This same question applies to ARMv7 with Neon which also lacks a 64-bit comparator. The sister question to this is found here: What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon? 回答1: __m128i pcmpgtq_sse2 (__m128i a, __m128i b) { __m128i r =

How to simulate pcmpgtq on sse2?

阅读更多关于 How to simulate pcmpgtq on sse2?

How to simulate pcmpgtq on sse2?

阅读更多关于 How to simulate pcmpgtq on sse2?

How to simulate pcmpgtq on sse2?

阅读更多关于 How to simulate pcmpgtq on sse2?