instruction-set

What is -(-128) for signed single byte char in C?

十年热恋 提交于 2019-12-01 17:10:07
问题 My little program: #include <stdio.h> int main() { signed char c = -128; c = -c; printf("%d", c); return 0; } print: -128 Is minus (-) operator portable across CPU? 回答1: The operand of the unary minus first undergoes standard promitions, so it is of type int , which can represent the value -128 . The result of the operation is the value 128 , also of type int . The conversion from int to signed char , being a narrowing of signed types, is implementation-defined. (Your implementation seems to

assembly cltq and movslq difference

ⅰ亾dé卋堺 提交于 2019-12-01 16:45:50
Chapter 3 of Computer Systems A Programmer's Perspective (2nd Edition) mentions that cltq is equivalent to movslq %eax, %rax . Why did they create a new instruction ( cltq ) instead of just using movslq %eax,%rax ? Isn't that redundant? TL;DR : use cltq when possible, because it's one byte shorter than the exactly-equivalent movslq %eax, %rax . That's a very minor advantage (so don't sacrifice anything else to make this happen) but choose eax if you're going to want to sign-extend it a lot. This is mostly relevant for compiler-writers (compiling signed-integer loop counters indexing arrays);

assembly cltq and movslq difference

十年热恋 提交于 2019-12-01 15:31:10
问题 Chapter 3 of Computer Systems A Programmer's Perspective (2nd Edition) mentions that cltq is equivalent to movslq %eax, %rax . Why did they create a new instruction ( cltq ) instead of just using movslq %eax,%rax ? Isn't that redundant? 回答1: TL;DR : use cltq when possible, because it's one byte shorter than the exactly-equivalent movslq %eax, %rax . That's a very minor advantage (so don't sacrifice anything else to make this happen) but choose eax if you're going to want to sign-extend it a

movq and 64 bit numbers

╄→尐↘猪︶ㄣ 提交于 2019-12-01 14:26:29
When I write to a register, everything is fine, movq $0xffffffffffffffff, %rax But I get Error: operand size mismatch when I write to a memory location, movq $0xffffffffffffffff, -8(%rbp) Why is that? I see in compiled C code that in asm these numbers are split in two and two movl instructions show up. Maybe you can tell me where the mowq and other instructions are documented. Why is that? Because MOV r64, imm64 is a valid x86 instruction, but MOV r/m64, imm64 is not (there's no encoding for it). I see in compiled C code that in asm these numbers are split in two and two movl instructions show

movq and 64 bit numbers

孤者浪人 提交于 2019-12-01 13:04:21
问题 When I write to a register, everything is fine, movq $0xffffffffffffffff, %rax But I get Error: operand size mismatch when I write to a memory location, movq $0xffffffffffffffff, -8(%rbp) Why is that? I see in compiled C code that in asm these numbers are split in two and two movl instructions show up. Maybe you can tell me where the mowq and other instructions are documented. 回答1: Why is that? Because MOV r64, imm64 is a valid x86 instruction, but MOV r/m64, imm64 is not (there's no encoding

Dummy operations handling of Intel processor

自古美人都是妖i 提交于 2019-12-01 09:19:16
问题 Admittedly, I have a bit silly question. Basically, I am wondering if there are some special mechanisms provided by Intel processors to efficiently execute a series of dummy, i.e., NOP instructions? For instance,I could imagine there could be some kind of pre-fetch mechanism that identifies NOPS, discards them and tries to fetch some useful instructions instead. Or are these NOPS dispatched to the execution unit as normal instructions, meaning that i can roughly process 5 nops each cycle

Standard C++11 code equivalent to the PEXT Haswell instruction (and likely to be optimized by compiler)

混江龙づ霸主 提交于 2019-12-01 04:02:22
The Haswell architectures comes up with several new instructions. One of them is PEXT ( parallel bits extract ) whose functionality is explained by this image (source here ): It takes a value r2 and a mask r3 and puts the extracted bits of r2 into r1 . My question is the following: what would be the equivalent code of an optimized templated function in pure standard C++11, that would be likely to be optimized to this instruction by compilers in the future. Here is some code from Matthew Fioravante's stdcxx-bitops GitHub repo that was floated to the std-proposals mailinglist as a preliminary

x86 CMP Instruction Difference

社会主义新天地 提交于 2019-12-01 03:22:06
Question What is the (non-trivial) difference between the following two x86 instructions? 39 /r CMP r/m32,r32 Compare r32 with r/m32 3B /r CMP r32,r/m32 Compare r/m32 with r32 Background I'm building a Java assembler, which will be used by my compiler's intermediate language to produce Windows-32 executables. Currently I have following code: final ModelBase mb = new ModelBase(); // create new memory model mb.addCode(new Compare(Register.ECX, Register.EAX)); // add code mb.addCode(new Compare(Register.EAX, Register.ECX)); // add code final FileOutputStream fos = new FileOutputStream(new File(

How does mtune actually work?

孤街浪徒 提交于 2019-12-01 03:21:10
There's this related question: GCC: how is march different from mtune? However, the existing answers don't go much further than the GCC manual itself. At most, we get: If you use -mtune , then the compiler will generate code that works on any of them, but will favour instruction sequences that run fastest on the specific CPU you indicated. and The -mtune=Y option tunes the generated code to run faster on Y than on other CPUs it might run on. But exactly how does GCC favor one specific architecture, when bulding, while still being capable of running the build on other (usually older)

How does mtune actually work?

五迷三道 提交于 2019-11-30 23:50:30
问题 There's this related question: GCC: how is march different from mtune? However, the existing answers don't go much further than the GCC manual itself. At most, we get: If you use -mtune , then the compiler will generate code that works on any of them, but will favour instruction sequences that run fastest on the specific CPU you indicated. and The -mtune=Y option tunes the generated code to run faster on Y than on other CPUs it might run on. But exactly how does GCC favor one specific