compiler-optimization

Are compilers allowed to optimize out realloc?

感情迁移 提交于 2019-12-04 08:03:05
问题 I came across a situation where it would be useful to have unnecessary calls to realloc being optimized out. However, it seems like neither Clang nor GCC do such a thing (Compiler Explorer (godbolt.org)) - although I see optimizations being made with multiple calls to malloc . The example: void *myfunc() { void *data; data = malloc(100); data = realloc(data, 200); return data; } I expected it to be optimized to something like the following: void *myfunc() { return malloc(200); } Why is

GCC 5.1 Loop unrolling

时光总嘲笑我的痴心妄想 提交于 2019-12-04 07:17:42
Given the following code #include <stdio.h> int main(int argc, char **argv) { int k = 0; for( k = 0; k < 20; ++k ) { printf( "%d\n", k ) ; } } Using GCC 5.1 or later with -x c -std=c99 -O3 -funroll-all-loops --param max-completely-peeled-insns=1000 --param max-completely-peel-times=10000 does partially loop unrolling, it unrolls the loop ten times and then does a conditional jump. .LC0: .string "%d\n" main: pushq %rbx xorl %ebx, %ebx .L2: movl %ebx, %esi movl $.LC0, %edi xorl %eax, %eax call printf leal 1(%rbx), %esi movl $.LC0, %edi xorl %eax, %eax call printf leal 2(%rbx), %esi movl $.LC0,

Does profile-guided optimization done by compiler notably hurt cases not covered with profiling dataset?

我与影子孤独终老i 提交于 2019-12-04 07:02:57
This question is not specific to C++, AFAIK certain runtimes like Java RE can do profiled-guided optimization on the fly, I'm interested in that too. MSDN describes PGO like this: I instrument my program and run it under profiler, then the compiler uses data gathered by profiler to automatically reorganize branching and loops in such way that branch misprediction is reduced and most often run code is placed compactly to improve its locality Now obviously profiling result will depend on a dataset used. With normal manual profiling and optimization I'd find some bottlenecks and improve those

mtune and march when compiling in a docker image

时光毁灭记忆、已成空白 提交于 2019-12-04 06:37:10
When compiling in a docker image (i.e. in the dockerfile), what should march and mtune be set to? Note this is not about compiling in a running container, but compiling when the container is being built (e.g. building tools from source when the image is run). For example, currently when I run docker build and install R packages from source I get loads of (could be g++/gcc/f95 ...): g++ -std=gnu++14 [...] -O3 -march=native -mtune=native -fPIC [...] If I use native in an image built by Dockerhub, I guess this will use the spec of the machine used by Dockerhub, and this will impact the image

Passing by Value and copy elision optimization

ⅰ亾dé卋堺 提交于 2019-12-04 04:50:51
I came upon the article http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ Author's Advice: Don’t copy your function arguments. Instead, pass them by value and let the compiler do the copying. However, I don't quite get what benefits are gained in the two example presented in the article: //Don't T& T::operator=(T const& x) // x is a reference to the source { T tmp(x); // copy construction of tmp does the hard work swap(*this, tmp); // trade our resources for tmp's return *this; // our (old) resources get destroyed with tmp } vs // DO T& operator=(T x) // x is a copy of the source;

Are empty constructors always called in C++?

China☆狼群 提交于 2019-12-04 04:46:38
I have a general question, that may be a little compiler-specific. I'm interested in the conditions under which a constructor will be called. Specifically, in release mode/builds optimised for speed , will a compiler-generated or empty constructor always be called when you instantiate an object? class NoConstructor { int member; }; class EmptyConstructor { int member; }; class InitConstructor { InitConstructor() : member(3) {} int member; }; int main(int argc, _TCHAR* argv[]) { NoConstructor* nc = new NoConstructor(); //will this call the generated constructor? EmptyConstructor* ec = new

The impact of multiple compiler definitions in system.codedom in web.config

只谈情不闲聊 提交于 2019-12-04 04:23:49
问题 All my ASP.NET web projects are being developed exclusively in VB.NET. (And so are the satellite DLL projects, which is probably less relevant. When I look at the default web.config file, under the <system.codedom> tag, I always find compiler definitions present for both C# and VB.NET, as illustrated below. <compilers> <compiler language="c#;cs;csharp" extension=".cs" warningLevel="4" type="Microsoft.CSharp.CSharpCodeProvider, System, Version=2.0.0.0, Culture=neutral, PublicKeyToken

Is it realistic to use -O3 or -Ofast to compile your benchmark code or will it remove code?

浪尽此生 提交于 2019-12-04 04:14:57
问题 When compiling the benchmark code below with -O3 I was impressed by the difference it made in latency so i began to wonder whether the compiler is not "cheating" by removing code somehow. Is there a way to check for that? Am I safe to benchmark with -O3 ? Is it realistic to expect 15x gains in speed? Results without -O3 : Average: 239 nanos Min: 230 nanos (9 million iterations) Results with -O3 : Average: 14 nanos, Min: 12 nanos (9 million iterations) int iterations = stoi(argv[1]); int load

Weird behaviour of c# compiler due caching delegate

和自甴很熟 提交于 2019-12-04 02:47:30
Suppose I have following program: static void SomeMethod(Func<int, int> otherMethod) { otherMethod(1); } static int OtherMethod(int x) { return x; } static void Main(string[] args) { SomeMethod(OtherMethod); SomeMethod(x => OtherMethod(x)); SomeMethod(x => OtherMethod(x)); } I cannot understand compiled il code (it uses too extra code). Here is simplified version: class C { public static C c; public static Func<int, int> foo; public static Func<int, int> foo1; static C() { c = new C(); } C(){} public int b(int x) { return OtherMethod(x); } public int b1(int x) { return OtherMethod(x); } }

How do I stop GCC from optimizing this byte-for-byte copy into a memcpy call?

耗尽温柔 提交于 2019-12-04 02:43:05
I have this code for memcpy as part of my implementation of the standard C library which copies memory from src to dest one byte at a time: void *memcpy(void *restrict dest, const void *restrict src, size_t len) { char *dp = (char *restrict)dest; const char *sp = (const char *restrict)src; while( len-- ) { *dp++ = *sp++; } return dest; } With gcc -O2 , the code generated is reasonable: memcpy: .LFB0: movq %rdi, %rax testq %rdx, %rdx je .L2 xorl %ecx, %ecx .L3: movzbl (%rsi,%rcx), %r8d movb %r8b, (%rax,%rcx) addq $1, %rcx cmpq %rdx, %rcx jne .L3 .L2: ret .LFE0: However, at gcc -O3 , GCC