compiler-optimization

Why would the .NET JIT compiler decide to not inline or optimize away calls to empty static methods that have no side effects?

拜拜、爱过 提交于 2019-12-20 11:10:06
问题 I think I'm observing the .NET JIT compiler not inlining or optimizing away calls to empty static methods that have no side effects, which is a bit surprising given some bespoken online resources. My environment is Visual Studio 2013 on x64, Windows 8.1, .NET Framework 4.5. Given this simple test program (https://ideone.com/2BRCpC) class Program { static void EmptyBody() { } static void Main() { EmptyBody(); } } A release build with optimizations of the above program produces the following

Speed up Xcode Swift build times

一笑奈何 提交于 2019-12-20 10:46:37
问题 As my project has grown over the past year, so have its build times. Over the last few months it's gone from 4 minutes to around 7 (time includes GitHub pull, unit tests, etc). I have investigated with -Xfrontend -debug-time-function-bodies to find lines that are slow to compile, and changed that code. I believe it's now a question of project size; 182 Swift files, ≈31K lines. 23 storyboards, 52 XIBs. This is a regular UIKit app with a handful of Cocoapods dependencies. The bulk of the build

SIMD instructions lowering CPU frequency

喜欢而已 提交于 2019-12-20 10:35:15
问题 I read this article. It talked about why AVX-512 instruction: Intel’s latest processors have advanced instructions (AVX-512) that may cause the core, or maybe the rest of the CPU to run slower because of how much power they use. I think on Agner's blog also mentioned something similar (but I can't find the exact post). I wonder what other instructions supported by Skylake have the similar effect that they will lower the power to maximize the throughput later? All the v prefixed instructions

gcc -O0 outperforming -O3 on matrix sizes that are powers of 2 (matrix transpositions)

雨燕双飞 提交于 2019-12-20 09:39:16
问题 (For testing purposes) I have written a simple Method to calculate the transpose of a nxn Matrix void transpose(const size_t _n, double* _A) { for(uint i=0; i < _n; ++i) { for(uint j=i+1; j < _n; ++j) { double tmp = _A[i*_n+j]; _A[i*_n+j] = _A[j*_n+i]; _A[j*_n+i] = tmp; } } } When using optimization levels O3 or Ofast I expected the compiler to unroll some loops which would lead to higher performance especially when the matrix size is a multiple of 2 (i.e., the double loop body can be

How does a compiler optimise this factorial function so well?

时光毁灭记忆、已成空白 提交于 2019-12-20 09:03:43
问题 So I have been having a look at some of the magic that is O3 in GCC (well actually I'm compiling using Clang but it's the same with GCC and I'm guessing a large part of the optimiser was pulled over from GCC to Clang). Consider this C program: int foo(int n) { if (n == 0) return 1; return n * foo(n-1); } int main() { return foo(10); } The first thing I was pretty WOW-ed at (which was also WOW-ed at in this question - https://stackoverflow.com/a/414774/1068248) was how int foo(int) (a basic

Benefits of 'Optimize code' option in Visual Studio build

风格不统一 提交于 2019-12-20 08:57:34
问题 Much of our C# release code is built with the 'Optimize code' option turned off. I believe this is to allow code built in Release mode to be debugged more easily. Given that we are creating fairly simple desktop software which connects to backend Web Services, (ie. not a particularly processor-intensive application) then what if any sort of performance hit might be expected? And is any particular platform likely to be worse affected? Eg. multi-processor / 64 bit. 回答1: The full details are

libsvm compiled with AVX vs no AVX

眉间皱痕 提交于 2019-12-20 04:53:16
问题 I compiled a libsvm benchmarking app which does svm_predict() 100 times on the same image using the same model. The libsvm is compiled statically (MSVC 2017) by directly including svm.cpp and svm.h in my project. EDIT: adding benchmark details for (int i = 0; i < counter; i++) { std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now(); double label = svm_predict(model, input); std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution

libsvm compiled with AVX vs no AVX

守給你的承諾、 提交于 2019-12-20 04:53:03
问题 I compiled a libsvm benchmarking app which does svm_predict() 100 times on the same image using the same model. The libsvm is compiled statically (MSVC 2017) by directly including svm.cpp and svm.h in my project. EDIT: adding benchmark details for (int i = 0; i < counter; i++) { std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now(); double label = svm_predict(model, input); std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution

Why Bother With the 'inline' Keyword in C++?

左心房为你撑大大i 提交于 2019-12-20 03:45:05
问题 I've just been researching the use and benefits/pitfalls of the C++ keyword inline on the Microsoft Website and I understand all of that. My question is this: if the compiler evaluates functions to see if inlining them will result in the code being more efficient and the inline keyword is only a SUGGESTION to the compiler, why bother with the keyword at all? EDIT: A lot of people are moaning about my use of __inline instead of inline . I'd like to point out that __inline is the Microsoft

What's up with gcc weird stack manipulation when it wants extra stack alignment?

て烟熏妆下的殇ゞ 提交于 2019-12-20 01:07:03
问题 I've seen this r10 weirdness a few times, so let's see if anyone knows what's up. Take this simple function: #define SZ 4 void sink(uint64_t *p); void andpop(const uint64_t* a) { uint64_t result[SZ]; for (unsigned i = 0; i < SZ; i++) { result[i] = a[i] + 1; } sink(result); } It just adds 1 to each of the 4 64-bit elements of the passed-in array and stores it in a local and calls sink() on the result (to avoid the whole function being optimized away). Here's the corresponding assembly: andpop