compiler-optimization

When and why would I use -fno-elide-constructors?

自古美人都是妖i 提交于 2019-12-06 08:43:11
问题 I'm learning C++ and I came across the -fno-elide-constructors , below I have included the description from the man page. -fno-elide-constructors The C++ standard allows an implementation to omit creating a temporary which is only used to initialize another object of the same type. Specifying this option disables that optimization, and forces G++ to call the copy constructor in all cases. So with this option I am able disable this particular type of compiler optimization. I have a program

Why this dead store of unique_ptr cannot be eliminated?

为君一笑 提交于 2019-12-06 07:48:35
问题 #include <memory> #include <vector> using namespace std; vector<unique_ptr<int>> e; void f(unique_ptr<int> u) { e.emplace_back(move(u)); } For both Clang and GCC, the above code snippet generates something like: f(std::unique_ptr<int, std::default_delete<int> >): mov rsi, QWORD PTR e[rip+8] # rsi: vector.end_ptr cmp rsi, QWORD PTR e[rip+16] # [e + rip + 16]: vector.storage_end_ptr je .L52 # Slow path, need to reallocate mov rax, QWORD PTR [rdi] # rax: unique_ptr<int> u add rsi, 8 # end_ptr +=

gcc: is there no tail recursion if I return std::string in C++?

跟風遠走 提交于 2019-12-06 07:14:05
As per my answer in Write a recursive function that reverses the input string , I've tried seeing whether clang++ -O3 or g++ -O3 would make a tail-recursion optimisation, using some of the suggestions from How do I check if gcc is performing tail-recursion optimization? , but it doesn't look like any tail recursion optimisation is taking place. Any idea why? Does this have to do with the way C++ objects are created and destroyed? Is there any way to make it work? The programme: % cat t2.cpp #include <iostream> #include <string> std::string rerev1(std::string s) { if (s.empty()) return s;

What's optimal march & mtune options for gcc for “Pentium4 and above” processors

女生的网名这么多〃 提交于 2019-12-06 06:39:29
问题 My C++ application (compiled using g++) needs to work on Pentium-4 (32-bit) and above. However, it's typically used with Core2Duo or better processors. I'm currently using: -march=pentium4 -mtune=pentium4 . But some reading has prompted me to think that -march=pentium4 -mtune=generic might be better. Can anybody shed some light on this? What are the optimal values for march & mtune options in this case? Platform: GCC 4.1.2 on RHEL 5.3 (32-bit). 回答1: That would be -march=pentium4 -mtune=core2

Examples of CLR compiler optimizations

允我心安 提交于 2019-12-06 06:00:36
问题 I'm doing a presentation in few months about .Net performance and optimization, I wanted to provide some samples of unnecessary optimization, things that will be done by the compiler anyways. where can I find some explanation on what optimizations the compiler is actually capable of maybe some before and after code? 回答1: check out these links C# Compiler Optimizations compiler optimization msdn Also checkout this book on MSIL 1. Microsoft Intermediate Language: Comparison Between C# and VB

Can java inline a large method if the most of it would be dead code at the call site?

别来无恙 提交于 2019-12-06 04:06:16
I know that one of the criteria that Java HotSpot uses to decide whether a method is worth inlining is how large it the method is. On one hand, this seems sensible: if the method is large, in-lining leads to code bloat and the method would take so long to execute that the call overhead is trivial. The trouble with this logic is that it might turn out that AFTER you decide to inline, it becomes clear that for this particular call-site, most of the method is dead code. For instance, the method may be a giant switch statement, but most call sites call the method with a compile-time constant, so

How exactly does gcc do optimizations?

浪子不回头ぞ 提交于 2019-12-06 03:44:25
问题 In order to know how exactly the gcc do the optimization, I have written two program compiling with -O2, but there is some difference of the assembly code. In my programs, I want to output "hello" in the loop, and add some delay between each output. These two programs are only for illustrating my question, and I know I can using volatile or asm in program 1 to achieve my purpose. Program 1 #include <stdio.h> int main(int argc, char **argv) { unsigned long i = 0; while (1) { if (++i >

How could this Java code be sped up?

二次信任 提交于 2019-12-05 23:54:57
I am trying to benchmark how fast can Java do a simple task: read a huge file into memory and then perform some meaningless calculations on the data. All types of optimizations count. Whether it's rewriting the code differently or using a different JVM, tricking JIT .. Input file is a 500 million long list of 32 bit integer pairs separated by a comma. Like this: 44439,5023 33140,22257 ... This file takes 5.5GB on my machine. The program can't use more than 8GB of RAM and can use only a single thread . package speedracer; import java.io.FileInputStream; import java.nio.MappedByteBuffer; import

Will a static variable always use up memory?

夙愿已清 提交于 2019-12-05 22:43:58
Based on this discussion, I was wondering if a function scope static variable always uses memory or if the compiler is allowed to optimize that away. To illustrate the question, assume a function like such: void f() { static const int i = 3; int j = i + 1; printf("%d", j); } The compiler will very likely inline the value of i and probably do the calculation 3 + 1 at compile time, too. Since this is the only place the value of i is used, there is no need for any static memory being allocated. So is the compiler allowed to optimize the static away, or does the standard mandate that any static

In VC++ what is the #pragma equivalent of /O2 compiler option (optimize for speed)

蹲街弑〆低调 提交于 2019-12-05 22:17:23
According to msdn , /O2 (Maximize Speed) is equivalent to /Og/Oi/Ot/Oy/Ob2/Gs/GF/Gy and according to msdn again, the following pragma #pragma optimize( "[optimization-list]", {on | off} ) uses the same letters in its "optimization-list" than the /O compiler option. Available letters for the pragma are: g - Enable global optimizations. p - Improve floating-point consistency. s or t - Specify short or fast sequences of machine code. y - Generate frame pointers on the program stack. Which ones should I use to have the same meaning as /O2 ? bosmacs The Microsoft Docs article /O1, /O2 (Minimize