instruction-set

Detecting SIMD instruction sets to be used with C++ Macros in Visual Studio 2015

帅比萌擦擦* 提交于 2019-12-23 17:00:41
问题 So, here is what I am trying to accomplish. In my C++ project that has to be compiled with Microsoft Visual Studio 2015 or above , I need to have some code have different versions depending on the newest SIMD instrunction set available in the CPU of the user, among: SSE , SSE2 , SSE3 , SSSE3 , SSE4.1 , SSE4.2 , AVX , AVX2 and AVX512 . Since what I am look for at this point is compile-time CPU dispatching , my first guess was that it could be easily accomplished using compiler macros. However,

Compiler macro to detect BMI2 instruction set

牧云@^-^@ 提交于 2019-12-23 13:13:12
问题 I was searching on the web to find a proper solution, without much success. So I hope one of you know something about it: Is there any way to detect the "Intel Bit Manipulation Instruction Sets 2" (BMI2) compile time? I want to make some conditional thing based on the availability of it. 回答1: With GCC you can check for the __BMI2__ macro. This macro will be defined if the target supports BMI2 (e.g. -mbmi2 , -march=haswell ). This is the macro that the instrinsic's headers ( x86intrin.h ,

x86 function call types

你说的曾经没有我的故事 提交于 2019-12-23 03:58:27
问题 I'm new in x86. My question is about function calls. As far as i know there is three function call types: short call (0xe8), far call (0x9a) and near call (0x??). Some call short call a relative call (ip += arg / cs = inv) and far call an absolute call (ip = arg / cs = arg), but what about near call (ip = ? / cs = ?). Some say that calling function far (9a) is almost certainly wrong on 32-bit systems. Why? Doesn't x86 mean 32-bit system? Is far call's argument a flat address (the one we use

What instruction set does the Nvidia GeForce 6xx Series use?

安稳与你 提交于 2019-12-22 06:26:28
问题 Does the GeForce 6xx Series GPUS use RISC, CISC or VLIW style instructions? In one source, at http://www.motherboardpoint.com/risc-cisc-t241234.html someone said "GPUs are probably closer to VLIW than to RISC or CISC" . In another source, at http://en.wikipedia.org/wiki/Very_long_instruction_word#implementations it says "both Nvidia and AMD have since moved to RISC architectures in order to improve performance on non-graphics workload" 回答1: AFAIK, Nvidia does not publicly document it's

What instruction set does the Nvidia GeForce 6xx Series use?

家住魔仙堡 提交于 2019-12-22 06:26:27
问题 Does the GeForce 6xx Series GPUS use RISC, CISC or VLIW style instructions? In one source, at http://www.motherboardpoint.com/risc-cisc-t241234.html someone said "GPUs are probably closer to VLIW than to RISC or CISC" . In another source, at http://en.wikipedia.org/wiki/Very_long_instruction_word#implementations it says "both Nvidia and AMD have since moved to RISC architectures in order to improve performance on non-graphics workload" 回答1: AFAIK, Nvidia does not publicly document it's

Assembler mov issue

冷暖自知 提交于 2019-12-20 05:17:13
问题 I have the next code: mov ax,@data mov ds,ax Why I can not write just like this? mov ds,@data All source: .MODEL small .STACK 100h .DATA HelloMessage DB 'Hello, world',13,10,'$' .CODE .startup mov ax,@data mov ds,ax mov ah,9 mov dx,OFFSET HelloMessage int 21h mov ah,4ch int 21h END Thank you! 回答1: You can't, because the instruction set doesn't contain an instruction to do that. It is just one of the many idiosyncrasies of the x86. These kind of restrictions are fairly normal for assembly

Instruction Lengths

五迷三道 提交于 2019-12-18 08:30:40
问题 I was looking at the different instructions in assembly and I am confused on how the lengths of different operands and opcodes are decided upon. Is it something you ought to know from experience, or is there a way to find out which operand/operator combination takes up how many bytes? For eg: push %ebp ; takes up one byte mov %esp, %ebp ; takes up two bytes So the question is: Upon seeing a given instruction, how can I deduce how many bytes its opcode will require? 回答1: There's no hard and

x64 instruction encoding and the ModRM byte

若如初见. 提交于 2019-12-17 16:48:11
问题 The encoding of call qword ptr [rax] call qword ptr [rcx] is FF 10 FF 11 I can see where the last digit (0/1) comes from (the register number), but I'm trying to figure out where the second last digit (1) comes from. According to AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions page 56, "/digit - Indicates that the ModRM byte specifies only one register or memory (r/m) operand. The digit is specified by the ModRM reg field and is used as an instruction

What's the point of LEA EAX, [EAX]?

半腔热情 提交于 2019-12-17 15:34:09
问题 LEA EAX, [EAX] I encountered this instruction in a binary compiled with the Microsoft C compiler. It clearly can't change the value of EAX. Then why is it there? 回答1: It is a NOP . The following are typcially used as NOP . They all do the same thing but they result in machine code of different length. Depending on the alignment requirement one of them is chosen: xchg eax, eax = 90 mov eax, eax = 89 C0 lea eax, [eax + 0x00] = 8D 40 00 回答2: From this article: This trick is used by MSVC++

How to control which core a process runs on?

可紊 提交于 2019-12-17 15:15:37
问题 I can understand how one can write a program that uses multiple processes or threads: fork() a new process and use IPC, or create multiple threads and use those sorts of communication mechanisms. I also understand context switching. That is, with only once CPU, the operating system schedules time for each process (and there are tons of scheduling algorithms out there) and thereby we achieve running multiple processes simultaneously. And now that we have multi-core processors (or multi