instruction-set | 易学教程

What is -(-128) for signed single byte char in C?

阅读更多关于 What is -(-128) for signed single byte char in C?

问题 My little program: #include <stdio.h> int main() { signed char c = -128; c = -c; printf("%d", c); return 0; } print: -128 Is minus (-) operator portable across CPU? 回答1: The operand of the unary minus first undergoes standard promitions, so it is of type int , which can represent the value -128 . The result of the operation is the value 128 , also of type int . The conversion from int to signed char , being a narrowing of signed types, is implementation-defined. (Your implementation seems to

assembly cltq and movslq difference

阅读更多关于 assembly cltq and movslq difference

Chapter 3 of Computer Systems A Programmer's Perspective (2nd Edition) mentions that cltq is equivalent to movslq %eax, %rax . Why did they create a new instruction ( cltq ) instead of just using movslq %eax,%rax ? Isn't that redundant? TL;DR : use cltq when possible, because it's one byte shorter than the exactly-equivalent movslq %eax, %rax . That's a very minor advantage (so don't sacrifice anything else to make this happen) but choose eax if you're going to want to sign-extend it a lot. This is mostly relevant for compiler-writers (compiling signed-integer loop counters indexing arrays);

assembly cltq and movslq difference

阅读更多关于 assembly cltq and movslq difference

问题 Chapter 3 of Computer Systems A Programmer's Perspective (2nd Edition) mentions that cltq is equivalent to movslq %eax, %rax . Why did they create a new instruction ( cltq ) instead of just using movslq %eax,%rax ? Isn't that redundant? 回答1: TL;DR : use cltq when possible, because it's one byte shorter than the exactly-equivalent movslq %eax, %rax . That's a very minor advantage (so don't sacrifice anything else to make this happen) but choose eax if you're going to want to sign-extend it a

movq and 64 bit numbers

阅读更多关于 movq and 64 bit numbers

When I write to a register, everything is fine, movq $0xffffffffffffffff, %rax But I get Error: operand size mismatch when I write to a memory location, movq $0xffffffffffffffff, -8(%rbp) Why is that? I see in compiled C code that in asm these numbers are split in two and two movl instructions show up. Maybe you can tell me where the mowq and other instructions are documented. Why is that? Because MOV r64, imm64 is a valid x86 instruction, but MOV r/m64, imm64 is not (there's no encoding for it). I see in compiled C code that in asm these numbers are split in two and two movl instructions show

movq and 64 bit numbers

阅读更多关于 movq and 64 bit numbers

问题 When I write to a register, everything is fine, movq $0xffffffffffffffff, %rax But I get Error: operand size mismatch when I write to a memory location, movq $0xffffffffffffffff, -8(%rbp) Why is that? I see in compiled C code that in asm these numbers are split in two and two movl instructions show up. Maybe you can tell me where the mowq and other instructions are documented. 回答1: Why is that? Because MOV r64, imm64 is a valid x86 instruction, but MOV r/m64, imm64 is not (there's no encoding

Dummy operations handling of Intel processor

阅读更多关于 Dummy operations handling of Intel processor

问题 Admittedly, I have a bit silly question. Basically, I am wondering if there are some special mechanisms provided by Intel processors to efficiently execute a series of dummy, i.e., NOP instructions? For instance,I could imagine there could be some kind of pre-fetch mechanism that identifies NOPS, discards them and tries to fetch some useful instructions instead. Or are these NOPS dispatched to the execution unit as normal instructions, meaning that i can roughly process 5 nops each cycle

Standard C++11 code equivalent to the PEXT Haswell instruction (and likely to be optimized by compiler)

阅读更多关于 Standard C++11 code equivalent to the PEXT Haswell instruction (and likely to be optimized by compiler)

The Haswell architectures comes up with several new instructions. One of them is PEXT ( parallel bits extract ) whose functionality is explained by this image (source here ): It takes a value r2 and a mask r3 and puts the extracted bits of r2 into r1 . My question is the following: what would be the equivalent code of an optimized templated function in pure standard C++11, that would be likely to be optimized to this instruction by compilers in the future. Here is some code from Matthew Fioravante's stdcxx-bitops GitHub repo that was floated to the std-proposals mailinglist as a preliminary

x86 CMP Instruction Difference

阅读更多关于 x86 CMP Instruction Difference

Question What is the (non-trivial) difference between the following two x86 instructions? 39 /r CMP r/m32,r32 Compare r32 with r/m32 3B /r CMP r32,r/m32 Compare r/m32 with r32 Background I'm building a Java assembler, which will be used by my compiler's intermediate language to produce Windows-32 executables. Currently I have following code: final ModelBase mb = new ModelBase(); // create new memory model mb.addCode(new Compare(Register.ECX, Register.EAX)); // add code mb.addCode(new Compare(Register.EAX, Register.ECX)); // add code final FileOutputStream fos = new FileOutputStream(new File(

How does mtune actually work?

阅读更多关于 How does mtune actually work?

There's this related question: GCC: how is march different from mtune? However, the existing answers don't go much further than the GCC manual itself. At most, we get: If you use -mtune , then the compiler will generate code that works on any of them, but will favour instruction sequences that run fastest on the specific CPU you indicated. and The -mtune=Y option tunes the generated code to run faster on Y than on other CPUs it might run on. But exactly how does GCC favor one specific architecture, when bulding, while still being capable of running the build on other (usually older)

How does mtune actually work?

阅读更多关于 How does mtune actually work?

问题 There's this related question: GCC: how is march different from mtune? However, the existing answers don't go much further than the GCC manual itself. At most, we get: If you use -mtune , then the compiler will generate code that works on any of them, but will favour instruction sequences that run fastest on the specific CPU you indicated. and The -mtune=Y option tunes the generated code to run faster on Y than on other CPUs it might run on. But exactly how does GCC favor one specific