x86 | 易学教程

Getting GCC/Clang to use CMOV

阅读更多关于 Getting GCC/Clang to use CMOV

问题 I have a simple tagged union of values. The values can either be int64_ts or doubles . I am performing addition on the these unions with the caveat that if both arguments represent int64_t values then the result should also have an int64_t value. Here is the code: #include<stdint.h> union Value { int64_t a; double b; }; enum Type { DOUBLE, LONG }; // Value + type. struct TaggedValue { Type type; Value value; }; void add(const TaggedValue& arg1, const TaggedValue& arg2, TaggedValue* out) {

How to extract 8 integers from a 256 vector using intel intrinsics?

阅读更多关于 How to extract 8 integers from a 256 vector using intel intrinsics?

问题 I'm trying to enhance the performance of my code by using the 256bit vector (Intel intrinsics - AVX). I have an I7 Gen.4 (Haswell architecture) processor supporting SSE1 to SSE4.2 and AVX/AVX2 Extensions. This is the code snippet that I'm trying to enhance: /* code snipet */ kfac1 = kfac + factor; /* 7 cycles for 7 additions */ kfac2 = kfac1 + factor; kfac3 = kfac2 + factor; kfac4 = kfac3 + factor; kfac5 = kfac4 + factor; kfac6 = kfac5 + factor; kfac7 = kfac6 + factor; k1fac1 = k1fac +

How to extract 8 integers from a 256 vector using intel intrinsics?

阅读更多关于 How to extract 8 integers from a 256 vector using intel intrinsics?

Chosing suffix (l-b-w) for mov instruction

阅读更多关于 Chosing suffix (l-b-w) for mov instruction

问题 I am new to assembly.I am reading computers system programmer's perspective. I don't understand how I choose suffix for mov instruction. I know each register and bit count. Suffix usage is determined by bit count (32 bit l , 16 bit w , 8 bit b ). Few example is not valid for prior sentence. For example %esp is 32-bit register but for 4. step suffix b is used instead of l . Please give an explanation for using suffix. questions : answer : l-w-b-b-l-w-l Source: Computer Systems: A Programmer's

Chosing suffix (l-b-w) for mov instruction

阅读更多关于 Chosing suffix (l-b-w) for mov instruction

How to execute x86 commands from data buffer?

阅读更多关于 How to execute x86 commands from data buffer?

问题 My question is dedicated mostly to profs and is about using C++ in "strange" way. In C++ there isn't really big difference between pointers to variables and pointers to functions. We can do something useless like this: char* buff = new char[32]; void (*func)() = (void (*)())buff; But we allmost created a function that never existed, right? What if we go further and fill buff with x86 commands stord in a file? OS will never know that a function was created. #include <iostream> using namespace

How to execute x86 commands from data buffer?

阅读更多关于 How to execute x86 commands from data buffer?

Why is using structure Vector3I instead of three ints much slower in C#?

阅读更多关于 Why is using structure Vector3I instead of three ints much slower in C#?

问题 I'm processing lots of data in a 3D grid so I wanted to implement a simple iterator instead of three nested loops. However, I encountered a performance problem: first, I implemented a simple loop using only int x, y and z variables. Then I implemented a Vector3I structure and used that - and the calculation time doubled. Now I'm struggling with the question - why is that? What did I do wrong? Example for reproduction: using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; using

Why is using structure Vector3I instead of three ints much slower in C#?

阅读更多关于 Why is using structure Vector3I instead of three ints much slower in C#?

Stack alignment on x86

阅读更多关于 Stack alignment on x86

问题 I had a mysterious bus error that occurred, on a x86 (32-bit) platform, when running code compiled with gcc-4.8.1 with -march=pentium4 . I traced the problem to an SSE instruction: movdqa %xmm5,0x50(%esp) with esp = 0xbfffedac. movdqa requires the address to be 16-byte aligned, which is not the case here, thus the bus error. The problem does not occur if compiling with -march=native (this is a Core-i3 processor). As far as I know, the only stack alignment guaranteed on Linux/x86 is 4-byte.