compiler-optimization | 易学教程

Compiler written in Java: Peephole optimizer implementation

阅读更多关于 Compiler written in Java: Peephole optimizer implementation

问题 I'm writing a compiler for a subset of Pascal. The compiler produces machine instructions for a made-up machine. I want to write a peephole optimizer for this machine language, but I'm having trouble substituting some of the more complicated patterns. Peephole optimizer specification I've researched several different approaches to writing a peephole optimizer, and I've settled on a back-end approach: The Encoder makes a call to an emit() function every time a machine instruction is to be

Is there any advantage to definining a val over a def in a trait?

阅读更多关于 Is there any advantage to definining a val over a def in a trait?

问题 In Scala, a val can override a def , but a def cannot override a val . So, is there an advantage to declaring a trait e.g. like this: trait Resource { val id: String } rather than this? trait Resource { def id: String } The follow-up question is: how does the compiler treat calling val s and def s differently in practice and what kind of optimizations does it actually do with val s? The compiler insists on the fact that val s are stable — what does in mean in practice for the compiler?

Is there a code that results in 50% branch prediction miss?

阅读更多关于 Is there a code that results in 50% branch prediction miss?

问题 The problem: I'm trying to figure out how to write a code (C preffered, ASM only if there is no other solution) that would make the branch prediction miss in 50% of the cases . So it has to be a piece of code that "is imune" to compiler optimizations related to branching and also all the HW branch prediction should not go better than 50% (tossing a coin). Even a greater challenge is being able to run the code on multiple CPU architectures and get the same 50% miss ratio. I managed to write a

Preventing compiler optimizations while benchmarking

阅读更多关于 Preventing compiler optimizations while benchmarking

问题 I recently came across this brilliant cpp2015 talk CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!" One of the techniques mentioned to prevent the compiler from optimizing code is using the below functions. static void escape(void *p) { asm volatile("" : : "g"(p) : "memory"); } static void clobber() { asm volatile("" : : : "memory"); } void benchmark() { vector<int> v; v.reserve(1); escape(v.data()); v.push_back(10); clobber() } I'm trying to understand

What does “sibling calls” mean?

阅读更多关于 What does “sibling calls” mean?

On GCC manual, -foptimize-sibling-calls Optimize sibling and tail recursive calls. I know tail recursive calls, for example int sum(int n) { return n == 1 ? 1 : n + sum(n-1); } However, what does sibling calls mean? the compiler considers two functions as being siblings if they share the same structural equivalence of return types, as well as matching space requirements of their arguments. http://www.drdobbs.com/tackling-c-tail-calls/184401756 It must be something like this: int ispair(int n) { return n == 0 ? 1 : isodd(n-1); } int isodd(int n) { return n == 0 ? 0 : ispair(n-1); } In general,

How can I find the micro-ops which instructions on Intel's x86 CPUs decode to?

阅读更多关于 How can I find the micro-ops which instructions on Intel's x86 CPUs decode to?

问题 The Intel Optimization Reference, under Section 3.5.1, advises: "Favor single-micro-operation instructions." "Avoid using complex instructions (for example, enter, leave, or loop) that have more than 4 micro-ops and require multiple cycles to decode. Use sequences of simple instructions instead." Although Intel themselves tell compiler writers to use instructions which decode to few micro-ops, I can't find anything in any of their manuals which explains just how many micro-ops each ASM

Does Python optimize function calls from loops?

阅读更多关于 Does Python optimize function calls from loops?

问题 Say, I have a code which calls some function millions time from loop and I want the code to be fast: def outer_function(file): for line in file: inner_function(line) def inner_function(line): # do something pass It's not necessarily a file processing, it could be for example a function drawing point called from function drawing line. The idea is that logically these two have to be separated, but from performance point of view they should act together as fast as possible. Does Python detects

How to decrease the size of generated binaries?

阅读更多关于 How to decrease the size of generated binaries?

问题 I know that there is an option "-Os" to "Optimize for size", but it has little affect, or even increase the size on some occasion :( strip (or "-s" option) removes debug symbol table, which works fine; but it can only decrease only a small propotion of the size. Is there any other way to go furthur? 回答1: Apart from the obvious ( -Os -s ), aligning functions to the smallest possible value that will not crash (I don't know ARM alignment requirements) might squeeze out a few bytes per function.

Can't turn off gcc optimizer, Makefile from automake

阅读更多关于 Can't turn off gcc optimizer, Makefile from automake

I am trying to get ZBar in a debug session. I am able to do so, but I can't get the optimizer to turn off, so my debug session jumps around unexpectedly and many variables are labeled as optimized-out in Eclipse Indigo. I am running in Ubuntu. I have tried adding -O0 as far right in any gcc call in the Makefiles as possible, since the last -O is the acting one. I used -Q --help=optimizers to find what to be looking for, but its output is a bit odd: libtool: compile: gcc -DHAVE_CONFIG_H -I. -I./include -I./zbar -I./include -O0 -O0 -Q --help=optimizers -Wall -Wno-parentheses -O0 -g -O0 -Q --help

Is IL generated by expression trees optimized?

阅读更多关于 Is IL generated by expression trees optimized?

Ok this is merely curiosity, serves no real world help. I know that with expression trees you can generate MSIL on the fly just like the regular C# compiler does. Since compiler can decide optimizations, I'm tempted to ask what is the case with IL generated during Expression.Compile() . Basically two questions: Since at compile time the compiler can produce different (may be slightly) IL in debug mode and release mode , is there ever a difference in the IL generated by compiling an expression when built in debug mode and release mode? Also JIT which convert IL to native code at run time should