compiler-optimization | 易学教程

Why is gcc allowed to speculatively load from a struct?

阅读更多关于 Why is gcc allowed to speculatively load from a struct?

Example Showing the gcc Optimization and User Code that May Fault The function 'foo' in the snippet below will load only one of the struct members A or B; well at least that is the intention of the unoptimized code. typedef struct { int A; int B; } Pair; int foo(const Pair *P, int c) { int x; if (c) x = P->A; else x = P->B; return c/102 + x; } Here is what gcc -O3 gives: mov eax, esi mov edx, -1600085855 test esi, esi mov ecx, DWORD PTR [rdi+4] <-- ***load P->B** cmovne ecx, DWORD PTR [rdi] <-- ***load P->A*** imul edx lea eax, [rdx+rsi] sar esi, 31 sar eax, 6 sub eax, esi add eax, ecx ret So

C Programming: difference between ++i and i=i+1 from an assembler point of view?

阅读更多关于 C Programming: difference between ++i and i=i+1 from an assembler point of view?

This was an interview question. I said they were the same, but this was adjudged an incorrect response. From the assembler point of view, is there any imaginable difference? I have compiled two short C programs using default gcc optimization and -S to see the assembler output, and they are the same. The interviewer may have wanted an answer something like this: i=i+1 will have to load the value of i , add one to it, and then store the result back to i . In contrast, ++i may simply increment the value using a single assembly instruction, so in theory it could be more efficient. However, most

Inconsistent behavior of compiler optimization of unused string

阅读更多关于 Inconsistent behavior of compiler optimization of unused string

问题 I am curious why the following piece of code: #include <string> int main() { std::string a = "ABCDEFGHIJKLMNO"; } when compiled with -O3 yields the following code: main: # @main xor eax, eax ret (I perfectly understand that there is no need for the unused a so the compiler can entirely omit it from the generated code) However the following program: #include <string> int main() { std::string a = "ABCDEFGHIJKLMNOP"; // <-- !!! One Extra P } yields: main: # @main push rbx sub rsp, 48 lea rbx,

Compiler written in Java: Peephole optimizer implementation

阅读更多关于 Compiler written in Java: Peephole optimizer implementation

I'm writing a compiler for a subset of Pascal. The compiler produces machine instructions for a made-up machine. I want to write a peephole optimizer for this machine language, but I'm having trouble substituting some of the more complicated patterns. Peephole optimizer specification I've researched several different approaches to writing a peephole optimizer, and I've settled on a back-end approach: The Encoder makes a call to an emit() function every time a machine instruction is to be generated. emit(Instruction currentInstr) checks a table of peephole optimizations: If the current

Is there a code that results in 50% branch prediction miss?

阅读更多关于 Is there a code that results in 50% branch prediction miss?

The problem: I'm trying to figure out how to write a code (C preffered, ASM only if there is no other solution) that would make the branch prediction miss in 50% of the cases . So it has to be a piece of code that "is imune" to compiler optimizations related to branching and also all the HW branch prediction should not go better than 50% (tossing a coin). Even a greater challenge is being able to run the code on multiple CPU architectures and get the same 50% miss ratio. I managed to write a code that goes to 47% branch miss ratio on an x86 platform. I'm suspecting the missing could 3% come

How to deal with branch prediction when using a switch case in CPU emulation

阅读更多关于 How to deal with branch prediction when using a switch case in CPU emulation

I recently read the question here Why is it faster to process a sorted array than an unsorted array? and found the answer to be absolutely fascinating and it has completely changed my outlook on programming when dealing with branches that are based on Data. I currently have a fairly basic, but fully functioning interpreted Intel 8080 Emulator written in C, the heart of the operation is a 256 long switch-case table for handling each opcode. My initial thought was this would obviously be the fastest method of working as opcode encoding isn't consistent throughout the 8080 instruction set and

Java program runs slower when code that is never executed is commented out

阅读更多关于 Java program runs slower when code that is never executed is commented out

问题 I observed some strange behaviour in one of my Java programs. I have tried to strip the code down as much as possible while still being able to replicate the behaviour. Code in full below. public class StrangeBehaviour { static boolean recursionFlag = true; public static void main(String[] args) { long startTime = System.nanoTime(); for (int i = 0; i < 10000; i ++) { functionA(6, 0); } long endTime = System.nanoTime(); System.out.format("%.2f seconds elapsed.\n", (endTime - startTime) / 1000

Why can't (or doesn't) the compiler optimize a predictable addition loop into a multiplication?

阅读更多关于 Why can't (or doesn't) the compiler optimize a predictable addition loop into a multiplication?

问题 This is a question that came to mind while reading the brilliant answer by Mysticial to the question: why is it faster to process a sorted array than an unsorted array? Context for the types involved: const unsigned arraySize = 32768; int data[arraySize]; long long sum = 0; In his answer he explains that the Intel Compiler (ICC) optimizes this: for (int i = 0; i < 100000; ++i) for (int c = 0; c < arraySize; ++c) if (data[c] >= 128) sum += data[c]; ...into something equivalent to this: for

Disable compiler optimisation for a specific function or block of code (C#)

阅读更多关于 Disable compiler optimisation for a specific function or block of code (C#)

The compiler does a great job of optimising for RELEASE builds, but occasionally it can be useful to ensure that optimisation is turned off for a local function (but not the entire project by unticking Project Options > Optimize code ). In C++ this is achieved using the following (with the #pragma normally commented out): #pragma optimize( "", off ) // Some code such as a function (but not the whole project) #pragma optimize( "", on ) Is there an equivalent in C#? UPDATE Several excellent answers suggest decorating the method with MethodImplOptions.NoOptimization . This was implemented in .NET

Is it possible to implement bitwise operators using integer arithmetic?

阅读更多关于 Is it possible to implement bitwise operators using integer arithmetic?

问题 I am facing a rather peculiar problem. I am working on a compiler for an architecture that doesn't support bitwise operations. However, it handles signed 16-bit integer arithmetics and I was wondering if it would be possible to implement bitwise operations using only: Addition ( c = a + b ) Subtraction ( c = a - b ) Division ( c = a / b ) Multiplication ( c = a * b ) Modulus ( c = a % b ) Minimum ( c = min(a, b) ) Maximum ( c = max(a, b) ) Comparisons ( c = (a < b), c = (a == b), c = (a <= b)