llvm-codegen

Why is there a large performance impact when looping over an array with 240 or more elements?

混江龙づ霸主 提交于 2019-12-02 13:52:46
When running a sum loop over an array in Rust, I noticed a huge performance drop when CAPACITY >= 240. CAPACITY = 239 is about 80 times faster. Is there special compilation optimization Rust is doing for "short" arrays? Compiled with rustc -C opt-level=3 . use std::time::Instant; const CAPACITY: usize = 240; const IN_LOOPS: usize = 500000; fn main() { let mut arr = [0; CAPACITY]; for i in 0..CAPACITY { arr[i] = i; } let mut sum = 0; let now = Instant::now(); for _ in 0..IN_LOOPS { let mut s = 0; for i in 0..arr.len() { s += arr[i]; } sum += s; } println!("sum:{} time:{:?}", sum, now.elapsed())

What do the optimization levels `-Os` and `-Oz` do in rustc?

大憨熊 提交于 2019-12-01 02:44:11
Executing rustc -C help shows (among other things): -C opt-level=val -- optimize with possible levels 0-3, s, or z The levels 0 to 3 are fairly intuitive, I think: the higher the level, the more aggressive optimizations will be performed. However, I have no clue what the s and z options are doing and I couldn't find Rust-related information about them. Englund It seems like you are not the only one confused, as described in a Rust issue . It seems to follow the same pattern as Clang: Os For optimising the size when compiling. Oz For even more size optimisation. red75prime Looking at these and

What do the optimization levels `-Os` and `-Oz` do in rustc?

旧城冷巷雨未停 提交于 2019-11-30 21:37:35
问题 Executing rustc -C help shows (among other things): -C opt-level=val -- optimize with possible levels 0-3, s, or z The levels 0 to 3 are fairly intuitive, I think: the higher the level, the more aggressive optimizations will be performed. However, I have no clue what the s and z options are doing and I couldn't find Rust-related information about them. 回答1: It seems like you are not the only one confused, as described in a Rust issue. It seems to follow the same pattern as Clang: Os For

Why does my code run slower when I remove bounds checks?

自作多情 提交于 2019-11-30 00:17:44
问题 I'm writing a linear algebra library in Rust. I have a function to get a reference to a matrix cell at a given row and column. This function starts with a pair of assertions that the row and column are within bounds: #[inline(always)] pub fn get(&self, row: usize, col: usize) -> &T { assert!(col < self.num_cols.as_nat()); assert!(row < self.num_rows.as_nat()); unsafe { self.get_unchecked(row, col) } } In tight loops, I thought it might be faster to skip the bounds check, so I provide a get

Can Rust optimise away the bit-wise copy during move of an object someday?

天大地大妈咪最大 提交于 2019-11-27 05:54:19
问题 Consider the snippet struct Foo { dummy: [u8; 65536], } fn bar(foo: Foo) { println!("{:p}", &foo) } fn main() { let o = Foo { dummy: [42u8; 65536] }; println!("{:p}", &o); bar(o); } A typical result of the program is 0x7fffc1239890 0x7fffc1229890 where the addresses are different. Apparently, the large array dummy has been copied, as expected in the compiler's move implementation. Unfortunately, this can have non-trivial performance impact, as dummy is a very large array. This impact can

Can Rust optimise away the bit-wise copy during move of an object someday?

北慕城南 提交于 2019-11-27 03:57:13
问题 Consider the snippet struct Foo { dummy: [u8; 65536], } fn bar(foo: Foo) { println!("{:p}", &foo) } fn main() { let o = Foo { dummy: [42u8; 65536] }; println!("{:p}", &o); bar(o); } A typical result of the program is 0x7fffc1239890 0x7fffc1229890 where the addresses are different. Apparently, the large array dummy has been copied, as expected in the compiler's move implementation. Unfortunately, this can have non-trivial performance impact, as dummy is a very large array. This impact can

Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?

断了今生、忘了曾经 提交于 2019-11-26 12:30:51
I am disassembling this code on llvm clang Apple LLVM version 8.0.0 (clang-800.0.42.1): int main() { float a=0.151234; float b=0.2; float c=a+b; printf("%f", c); } I compiled with no -O specifications, but I also tried with -O0 (gives the same) and -O2 (actually computes the value and stores it precomputed) The resulting disassembly is the following (I removed the parts that are not relevant) -> 0x100000f30 <+0>: pushq %rbp 0x100000f31 <+1>: movq %rsp, %rbp 0x100000f34 <+4>: subq $0x10, %rsp 0x100000f38 <+8>: leaq 0x6d(%rip), %rdi 0x100000f3f <+15>: movss 0x5d(%rip), %xmm0 0x100000f47 <+23>:

Does the C++ standard allow for an uninitialized bool to crash a program?

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-26 05:56:59
问题 I know that an \"undefined behaviour\" in C++ can pretty much allow the compiler to do anything it wants. However, I had a crash that surprised me, as I assumed that the code was safe enough. In this case, the real problem happened only on a specific platform using a specific compiler, and only if optimization was enabled. I tried several things in order to reproduce the problem and simplify it to the maximum. Here\'s an extract of a function called Serialize , that would take a bool

Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?

↘锁芯ラ 提交于 2019-11-26 01:08:56
问题 I am disassembling this code on llvm clang Apple LLVM version 8.0.0 (clang-800.0.42.1): int main() { float a=0.151234; float b=0.2; float c=a+b; printf(\"%f\", c); } I compiled with no -O specifications, but I also tried with -O0 (gives the same) and -O2 (actually computes the value and stores it precomputed) The resulting disassembly is the following (I removed the parts that are not relevant) -> 0x100000f30 <+0>: pushq %rbp 0x100000f31 <+1>: movq %rsp, %rbp 0x100000f34 <+4>: subq $0x10,