compiler-optimization

Why does the Rust compiler not optimize code assuming that two mutable references cannot alias?

那年仲夏 提交于 2019-12-03 00:04:49
问题 As far as I know, reference/pointer aliasing can hinder the compiler's ability to generate optimized code, since they must ensure the generated binary behaves correctly in the case where the two references/pointers indeed alias. For instance, in the following C code, void adds(int *a, int *b) { *a += *b; *a += *b; } when compiled by clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) with the -O3 flag, it emits 0000000000000000 <adds>: 0: 8b 07 mov (%rdi),%eax 2: 03 06 add (%rsi),%eax 4: 89

How to decrease the size of generated binaries?

风流意气都作罢 提交于 2019-12-02 22:08:15
I know that there is an option "-Os" to "Optimize for size", but it has little affect, or even increase the size on some occasion :( strip (or "-s" option) removes debug symbol table, which works fine; but it can only decrease only a small propotion of the size. Is there any other way to go furthur? Apart from the obvious ( -Os -s ), aligning functions to the smallest possible value that will not crash (I don't know ARM alignment requirements) might squeeze out a few bytes per function. -Os should already disable aligning functions, but this might still default to a value like 4 or 8. If

Why is memcmp(a, b, 4) only sometimes optimized to a uint32 comparison?

こ雲淡風輕ζ 提交于 2019-12-02 21:33:08
Given this code: #include <string.h> int equal4(const char* a, const char* b) { return memcmp(a, b, 4) == 0; } int less4(const char* a, const char* b) { return memcmp(a, b, 4) < 0; } GCC 7 on x86_64 introduced an optimization for the first case (Clang has done it for a long time): mov eax, DWORD PTR [rsi] cmp DWORD PTR [rdi], eax sete al movzx eax, al But the second case still calls memcmp() : sub rsp, 8 mov edx, 4 call memcmp add rsp, 8 shr eax, 31 Could a similar optimization be applied to the second case? What's the best assembly for this, and is there any clear reason why it isn't being

Are compilers allowed to optimize out realloc?

穿精又带淫゛_ 提交于 2019-12-02 20:06:37
I came across a situation where it would be useful to have unnecessary calls to realloc being optimized out. However, it seems like neither Clang nor GCC do such a thing ( Compiler Explorer (godbolt.org)) - although I see optimizations being made with multiple calls to malloc . The example: void *myfunc() { void *data; data = malloc(100); data = realloc(data, 200); return data; } I expected it to be optimized to something like the following: void *myfunc() { return malloc(200); } Why is neither Clang nor GCC optimizing it out? - Are they not allowed to do so? chux Are they not allowed to do so

Does C/C++ offer any guarantee on minimal execution time?

百般思念 提交于 2019-12-02 20:03:03
Why do compilers seems to be polite toward loops that do nothing and do not eliminate them? Does the C standard require loops to take some time? Example, the following code: void foo(void) { while(1) { for(int k = 0; k < 1000000000; ++k); printf("Foo\n"); } } runs slower than this one: void foo(void) { while(1) { for(int k = 0; k < 1000; ++k); printf("Foo\n"); } } even with -O3 optimization level. I would expect removing empty loops allowed and thus get the same speed on both codes. Is "time spent" a side effect that should be preserved by a compiler? No, time spent does not count as

Why doesn't GCC optimize out deletion of null pointers in C++?

…衆ロ難τιáo~ 提交于 2019-12-02 19:59:26
Consider a simple program: int main() { int* ptr = nullptr; delete ptr; } With GCC (7.2), there is a call instruction regarding to operator delete in the resulting program. With Clang and Intel compilers, there are no such instructions, the null pointer deletion is completely optimized out ( -O2 in all cases). You can test here: https://godbolt.org/g/JmdoJi . I wonder whether such an optimization can be somehow turned on with GCC? (My broader motivation stems from a problem of custom swap vs std::swap for movable types, where deletion of null pointers can represent a performance penalty in the

Why doesn't 'd /= d' throw a division by zero exception when d == 0?

主宰稳场 提交于 2019-12-02 19:53:23
I don't quite understand why I don't get a division by zero exception: int d = 0; d /= d; I expected to get a division by zero exception but instead d == 1 . Why doesn't d /= d throw a division by zero exception when d == 0 ? C++ does not have a "Division by Zero" Exception to catch. The behavior you're observing is the result of Compiler optimizations: The compiler assumes Undefined Behavior doesn't happen Division by Zero in C++ is undefined behavior Therefore, code which can cause a Division by Zero is presumed to not do so. And, code which must cause a Division by Zero is presumed to never

Benefits of 'Optimize code' option in Visual Studio build

烈酒焚心 提交于 2019-12-02 18:31:35
Much of our C# release code is built with the 'Optimize code' option turned off. I believe this is to allow code built in Release mode to be debugged more easily. Given that we are creating fairly simple desktop software which connects to backend Web Services, (ie. not a particularly processor-intensive application) then what if any sort of performance hit might be expected? And is any particular platform likely to be worse affected? Eg. multi-processor / 64 bit. The full details are available at http://blogs.msdn.com/jaybaz_ms/archive/2004/06/28/168314.aspx . In brief... In managed code, the

How does a compiler optimise this factorial function so well?

好久不见. 提交于 2019-12-02 17:41:38
So I have been having a look at some of the magic that is O3 in GCC (well actually I'm compiling using Clang but it's the same with GCC and I'm guessing a large part of the optimiser was pulled over from GCC to Clang). Consider this C program: int foo(int n) { if (n == 0) return 1; return n * foo(n-1); } int main() { return foo(10); } The first thing I was pretty WOW-ed at (which was also WOW-ed at in this question - https://stackoverflow.com/a/414774/1068248 ) was how int foo(int) (a basic factorial function) compiles into a tight loop. This is the ARM assembly for it: .globl _foo .align 2

Is it true that having lots of small methods helps the JIT compiler optimize?

旧街凉风 提交于 2019-12-02 17:15:57
In a recent discussion about how to optimize some code, I was told that breaking code up into lots of small methods can significantly increase performance, because the JIT compiler doesn't like to optimize large methods. I wasn't sure about this since it seems that the JIT compiler should itself be able to identify self-contained segments of code, irrespective of whether they are in their own method or not. Can anyone confirm or refute this claim? The Hotspot JIT only inlines methods that are less than a certain (configurable) size. So using smaller methods allows more inlining, which is good.