compiler-optimization

Will compiler optimize out unused arguments of static function?

非 Y 不嫁゛ 提交于 2019-12-01 15:12:37
问题 I have a group of functions that are all declared static and fastcall . Most of them make use of a pointer to a struct that serves more or less the role of this in C++. Some of the functions don't need anything in the struct, but for uniformity sake I want to pass them the pointer anyway. Will the compiler notice that the argument is not used and omit allocating a register to it? 回答1: I wrote this nonsense program to test this. It's got some nonsense code in the function, and calls it several

Is it guaranteed that Complex Float variables will be 8-byte aligned in memory?

谁说我不能喝 提交于 2019-12-01 13:02:37
In C99 the new complex types were defined. I am trying to understand whether a compiler can take advantage of this knowledge in optimizing memory accesses. Are these objects ( A - F ) of type complex float guaranteed to be 8-byte aligned in memory? #include "complex.h" typedef complex float cfloat; cfloat A; cfloat B[10]; void func(cfloat C, cfloat *D) { cfloat E; cfloat F[10]; } Note that for D , the question relates to the object pointed to by D , not to the pointer storage itself. And, if that is assumed aligned, how can one be sure that the address passed is of an actual complex and not a

How expensive it is for the compiler to process an include-guarded header?

时光怂恿深爱的人放手 提交于 2019-12-01 12:06:34
问题 To speed up the compilation of a large source file does it make more sense to prune back the sheer number of headers used in a translation unit, or does the cost of compiling code far outweigh the time it takes to process-out an include-guarded header? If the latter is true an engineering effort would be better spent creating more, lightweight headers instead of less. So how long does it take for a modern compiler to handle a header that is effectively include-guarded out? At what point would

How to read a .obj file?

筅森魡賤 提交于 2019-12-01 08:12:23
In visual studio a object file (.obj) is generating after compiling a c++ file. How to read and understand it? Also how to see the code after compiler optimization in Visual studio 2015. Please redirect if this is already answered. Use the DUMPBIN tool from Visual Studio command prompt. Specifically, the /DISASM option shows you the disassembly. Do note the if you have link-time optimization enabled, then at least theoretically the final code may change after the .obj files are linked to form the final binary ( .exe or .dll ). You can disassemble those with DUMPBIN as well. You're sort of

Can and does the compiler optimize out two atomic loads? [duplicate]

喜欢而已 提交于 2019-12-01 07:03:44
This question already has an answer here: Why don't compilers merge redundant std::atomic writes? 9 answers Will the two loads be combined to one in such scenarios? If this is architecture dependent, what would be the case in say modern processors from say Intel? I believe atomic loads are equivalent to normal loads in Intel processors. void run1() { auto a = atomic_var.load(std::memory_order_relaxed); auto b = atomic_var.load(std::memory_order_relaxed); // Some code using a and b; } void run2() { if (atomic_var.load(std::memory_order_relaxed) == 2 && /*some conditions*/ ...) { if (atomic_var

Volatile and compiler optimization

≡放荡痞女 提交于 2019-12-01 06:45:35
Is it OK to say that 'volatile' keyword makes no difference if the compiler optimization is turned off i.e (gcc -o0 ....)? I had made some sample 'C' program and seeing the difference between volatile and non-volatile in the generated assembly code only when the compiler optimization is turned on i.e ((gcc -o1 ....). No, there is no basis for making such a statement. volatile has specific semantics that are spelled out in the standard. You are asserting that gcc -O0 always generates code such that every variable -- volatile or not -- conforms to those semantics. This is not guaranteed; even if

How can I compile *without* various instruction sets enabled?

左心房为你撑大大i 提交于 2019-12-01 06:30:44
问题 I am attempting to recompile some software with various instruction sets, specifically, SSE , SSE2 , SSE3 , SSSE3 , SSE4.1 , SSE4.2 , and AVX , and I would like to see how the code performs without these instruction sets to make sure I am getting the full effect of them. For example, I want to compile it with just -O2 with a gnu compiler and see how it performs when restricting it to only SSE , to see which flags it is invoking by default. I also have an intel compiler that I am working with

Using deftransform/defknown in SBCL internals to get the compiler to transform user authored functions

回眸只為那壹抹淺笑 提交于 2019-12-01 05:55:40
At the end of section 6.5 in the current SBCL manual, we have the following quote: If your system's performance is suffering because of some construct which could in principle be compiled efficiently, but which the SBCL compiler can't in practice compile efficiently, consider writing a patch to the compiler and submitting it for inclusion in the main sources. Such code is often reasonably straightforward to write; search the sources for the string “deftransform” to find many examples (some straightforward, some less so). I've been playing around and found the likes of sb-c::defknown and sb-c:

Why are unnecessary atomic loads not optimized away?

爷,独闯天下 提交于 2019-12-01 05:13:55
Let's consider this trivial code: #include <atomic> std::atomic<int> a; void f(){ for(int k=0;k<100;++k) a.load(std::memory_order_relaxed); } MSVC, Clang and GCC all perform 100 loads of a, while it seems obvious it could have been optimized away. I expected the function f to be a nop (See generated code here ) Actually, I expected this code generation for a volatile atomic: volatile std::atomic<int> va; void g(){ for(int k=0;k<100;++k) va.load(std::memory_order_relaxed); } Why do compilers not optimize away unnecessary atomic loads? 来源: https://stackoverflow.com/questions/56046501/why-are

Avoid stalling pipeline by calculating conditional early

不打扰是莪最后的温柔 提交于 2019-12-01 04:18:49
When talking about the performance of ifs, we usually talk about how mispredictions can stall the pipeline. The recommended solutions I see are: Trust the branch predictor for conditions that usually have one result; or Avoid branching with a little bit of bit-magic if reasonably possible; or Conditional moves where possible. What I couldn't find was whether or not we can calculate the condition early to help where possible. So, instead of: ... work if (a > b) { ... more work } Do something like this: bool aGreaterThanB = a > b; ... work if (aGreaterThanB) { ... more work } Could something