x86

Getting GCC/Clang to use CMOV

本小妞迷上赌 提交于 2021-02-19 04:38:05
问题 I have a simple tagged union of values. The values can either be int64_ts or doubles . I am performing addition on the these unions with the caveat that if both arguments represent int64_t values then the result should also have an int64_t value. Here is the code: #include<stdint.h> union Value { int64_t a; double b; }; enum Type { DOUBLE, LONG }; // Value + type. struct TaggedValue { Type type; Value value; }; void add(const TaggedValue& arg1, const TaggedValue& arg2, TaggedValue* out) {

How to extract 8 integers from a 256 vector using intel intrinsics?

雨燕双飞 提交于 2021-02-19 02:08:35
问题 I'm trying to enhance the performance of my code by using the 256bit vector (Intel intrinsics - AVX). I have an I7 Gen.4 (Haswell architecture) processor supporting SSE1 to SSE4.2 and AVX/AVX2 Extensions. This is the code snippet that I'm trying to enhance: /* code snipet */ kfac1 = kfac + factor; /* 7 cycles for 7 additions */ kfac2 = kfac1 + factor; kfac3 = kfac2 + factor; kfac4 = kfac3 + factor; kfac5 = kfac4 + factor; kfac6 = kfac5 + factor; kfac7 = kfac6 + factor; k1fac1 = k1fac +

How to extract 8 integers from a 256 vector using intel intrinsics?

旧时模样 提交于 2021-02-19 02:05:56
问题 I'm trying to enhance the performance of my code by using the 256bit vector (Intel intrinsics - AVX). I have an I7 Gen.4 (Haswell architecture) processor supporting SSE1 to SSE4.2 and AVX/AVX2 Extensions. This is the code snippet that I'm trying to enhance: /* code snipet */ kfac1 = kfac + factor; /* 7 cycles for 7 additions */ kfac2 = kfac1 + factor; kfac3 = kfac2 + factor; kfac4 = kfac3 + factor; kfac5 = kfac4 + factor; kfac6 = kfac5 + factor; kfac7 = kfac6 + factor; k1fac1 = k1fac +

Chosing suffix (l-b-w) for mov instruction

痞子三分冷 提交于 2021-02-18 23:03:23
问题 I am new to assembly.I am reading computers system programmer's perspective. I don't understand how I choose suffix for mov instruction. I know each register and bit count. Suffix usage is determined by bit count (32 bit l , 16 bit w , 8 bit b ). Few example is not valid for prior sentence. For example %esp is 32-bit register but for 4. step suffix b is used instead of l . Please give an explanation for using suffix. questions : answer : l-w-b-b-l-w-l Source: Computer Systems: A Programmer's

Chosing suffix (l-b-w) for mov instruction

帅比萌擦擦* 提交于 2021-02-18 23:01:08
问题 I am new to assembly.I am reading computers system programmer's perspective. I don't understand how I choose suffix for mov instruction. I know each register and bit count. Suffix usage is determined by bit count (32 bit l , 16 bit w , 8 bit b ). Few example is not valid for prior sentence. For example %esp is 32-bit register but for 4. step suffix b is used instead of l . Please give an explanation for using suffix. questions : answer : l-w-b-b-l-w-l Source: Computer Systems: A Programmer's

How to execute x86 commands from data buffer?

旧巷老猫 提交于 2021-02-18 22:48:37
问题 My question is dedicated mostly to profs and is about using C++ in "strange" way. In C++ there isn't really big difference between pointers to variables and pointers to functions. We can do something useless like this: char* buff = new char[32]; void (*func)() = (void (*)())buff; But we allmost created a function that never existed, right? What if we go further and fill buff with x86 commands stord in a file? OS will never know that a function was created. #include <iostream> using namespace

How to execute x86 commands from data buffer?

China☆狼群 提交于 2021-02-18 22:47:28
问题 My question is dedicated mostly to profs and is about using C++ in "strange" way. In C++ there isn't really big difference between pointers to variables and pointers to functions. We can do something useless like this: char* buff = new char[32]; void (*func)() = (void (*)())buff; But we allmost created a function that never existed, right? What if we go further and fill buff with x86 commands stord in a file? OS will never know that a function was created. #include <iostream> using namespace

Why is using structure Vector3I instead of three ints much slower in C#?

主宰稳场 提交于 2021-02-18 21:43:34
问题 I'm processing lots of data in a 3D grid so I wanted to implement a simple iterator instead of three nested loops. However, I encountered a performance problem: first, I implemented a simple loop using only int x, y and z variables. Then I implemented a Vector3I structure and used that - and the calculation time doubled. Now I'm struggling with the question - why is that? What did I do wrong? Example for reproduction: using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; using

Why is using structure Vector3I instead of three ints much slower in C#?

十年热恋 提交于 2021-02-18 21:42:26
问题 I'm processing lots of data in a 3D grid so I wanted to implement a simple iterator instead of three nested loops. However, I encountered a performance problem: first, I implemented a simple loop using only int x, y and z variables. Then I implemented a Vector3I structure and used that - and the calculation time doubled. Now I'm struggling with the question - why is that? What did I do wrong? Example for reproduction: using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; using

Stack alignment on x86

有些话、适合烂在心里 提交于 2021-02-18 21:12:17
问题 I had a mysterious bus error that occurred, on a x86 (32-bit) platform, when running code compiled with gcc-4.8.1 with -march=pentium4 . I traced the problem to an SSE instruction: movdqa %xmm5,0x50(%esp) with esp = 0xbfffedac. movdqa requires the address to be 16-byte aligned, which is not the case here, thus the bus error. The problem does not occur if compiling with -march=native (this is a Core-i3 processor). As far as I know, the only stack alignment guaranteed on Linux/x86 is 4-byte.