compiler-optimization | 易学教程

Implementing single-precision division as double-precision multiplication

阅读更多关于 Implementing single-precision division as double-precision multiplication

问题 Question For a C99 compiler implementing exact IEEE 754 arithmetic, do values of f , divisor of type float exist such that f / divisor != (float)(f * (1.0 / divisor)) ? EDIT: By “implementing exact IEEE 754 arithmetic” I mean a compiler that rightfully defines FLT_EVAL_METHOD as 0. Context A C compiler that provides IEEE 754-compliant floating-point can only replace a single-precision division by a constant by a single-precision multiplication by the inverse if said inverse is itself

C++: Set bool value only if not set

阅读更多关于 C++: Set bool value only if not set

问题 I have code in my C++ application that generally does this: bool myFlag = false; while (/*some finite condition unrelated to myFlag*/) { if (...) { // statements, unrelated to myFlag } else { // set myFlag to true, perhaps only if it was false before? } } if (myFlag) { // Do something... } The question I have pertains to the else statement of my code. Basically, my loop may set the value of myFlag from false to true, based on a certain condition not being met. Never will the flag be unset

What is the minimum supported SSE flag that can be enabled on macOS?

阅读更多关于 What is the minimum supported SSE flag that can be enabled on macOS?

问题 Most of the hardware I uses supports SSE2 these days. On Windows and Linux, I have some code to test SSE support. I read somewhere that macOS has supported SSE for a long time, but I don't know the minimum version that can be enabled. The final binary will be copied to other macOS platforms so I cannot use -march=native like with GCC. If it is enabled by default on all builds, do I have to pass -msse or -msse2 flags when building my code ? Here is my compiler version: Apple LLVM version 6.0

RVO force compilation error on failure

阅读更多关于 RVO force compilation error on failure

问题 Lots of discussions here about when RVO can be done but not much about when it is actually done. As stated may times, RVO can not be guaranteed according to the Standard but is there a way to guarantee that either RVO optimization succeeds or the corresponding code fails to compile? So far I partially succeeded to make the code issue link errors when RVO fails. For this I declare the copy constructors without defining them. Obviously this is neither robust nor feasible in the non rare cases

Does Java Compiler include String Constant Folding?

阅读更多关于 Does Java Compiler include String Constant Folding?

问题 I found out that Java supports constant folding of primitive types, but what about String s? Example If I create the following source code out.write("" + "<markup>" + "<nested>" + "Easier to read if it is split into multiple lines" + "</nested>" + "</markup>" + ""); What goes into the compiled code? Combined Version? out.write("<markup><nested>Easier to read if it is split into multiple lines</nested></markup>"); Or the less efficient run-time concatenation version? out.write(new

clang vs gcc - optimization including operator new

阅读更多关于 clang vs gcc - optimization including operator new

问题 I have this simple example I was testing against and I noticed that gcc optimizations (-O3) seems not be as good as clang ones when operator new is involved. I was wondering what might be the issue and if it possible to force gcc to produce more optimized code somehow? template<typename T> T* create() { return new T(); } int main() { auto result = 0; for (auto i = 0; i < 1000000; ++i) { result += (create<int>() != nullptr); } return result; } #clang3.6++ -O3 -s --std=c++11 test.cpp #size a

Which for loop header performs better?

阅读更多关于 Which for loop header performs better?

问题 I see the following a lot in the Android documentation: int n = getCount(); for (int i = 0; i < n; i ++) { // do somthing } But I'm used to seeing and doing: for (int i = 0; i < getCount(); i ++) { // do somthing } I'm curious if one is more efficient than the other? What is exactly happening in these two scenarios? When you call getCount() in the second way, does the computer have to allocate another variable? Or is it simply a matter of code cleanliness or preference? 回答1: This is what the

Speed up random memory access using prefetch

阅读更多关于 Speed up random memory access using prefetch

问题 I am trying to speed up a single program by using prefetches. The purpose of my program is just for test. Here is what it does: It uses two int buffers of the same size It reads one-by-one all the values of the first buffer It reads the value at the index in the second buffer It sums all the values taken from the second buffer It does all the previous steps for bigger and bigger At the end, I print the number of voluntary and involuntary CPU In the very first time, values in the first buffers

Understanding what clang is doing in assembly, decrementing for a loop that is incrementing

阅读更多关于 Understanding what clang is doing in assembly, decrementing for a loop that is incrementing

问题 Consider the following code, in C++: #include <cstdlib> std::size_t count(std::size_t n) { std::size_t i = 0; while (i < n) { asm volatile("": : :"memory"); ++i; } return i; } int main(int argc, char* argv[]) { return count(argc > 1 ? std::atoll(argv[1]) : 1); } It is just a loop that is incrementing its value, and returns it at the end. The asm volatile prevents the loop from being optimized away. We compile it under g++ 8.1 and clang++ 5.0 with the arguments -Wall -Wextra -std=c++11 -g -O3

Benchmarks of code generated by different g++ versions

阅读更多关于 Benchmarks of code generated by different g++ versions

问题 I work on a runtime system for an application domain that is very performance sensitive. We go to a lot of effort to maintain backward compatibility with older compiler versions, including avoiding more recently-implemented language constructs, and synthesizing them for the older versions. However, I'm concerned that this effort does a disservice to our users, by enabling them to continue to use compiler releases that are costing them huge amounts of performance. Unfortunately, I haven't been