Why don't LLVM passes optimize floating point instructions? [duplicate]

问题

See above. I wrote to sample functions:

source.ll:

define i32 @bleh(i32 %x) {
entry:
  %addtmp = add i32 %x, %x
  %addtmp1 = add i32 %addtmp, %x
  %addtmp2 = add i32 %addtmp1, %x
  %addtmp3 = add i32 %addtmp2, %x
  %addtmp4 = add i32 %addtmp3, 1
  %addtmp5 = add i32 %addtmp4, 2
  %addtmp6 = add i32 %addtmp5, 3
  %multmp = mul i32 %x, 3
  %addtmp7 = add i32 %addtmp6, %multmp
  ret i32 %addtmp7
}

source-fp.ll:

define double @bleh(double %x) {
entry:
  %addtmp = fadd double %x, %x
  %addtmp1 = fadd double %addtmp, %x
  %addtmp2 = fadd double %addtmp1, %x
  %addtmp3 = fadd double %addtmp2, %x
  %addtmp4 = fadd double %addtmp3, 1.000000e+00
  %addtmp5 = fadd double %addtmp4, 2.000000e+00
  %addtmp6 = fadd double %addtmp5, 3.000000e+00
  %multmp = fmul double %x, 3.000000e+00
  %addtmp7 = fadd double %addtmp6, %multmp
  ret double %addtmp7
}

Why is it that when I optimize both functions using

opt -O3 source[-fp].ll -o opt.source[-fp].ll -S

that the i32 one gets optimized but the double one doesn't? I expected the fadd to get combined to a single fmul. Instead it looks exactly the same.

Is it due to the flags being set differently? I am aware of certain optimizations that are possible for i32 that are not doable for double. But the absence of simple constant folding is beyond my understanding.

I am using LLVM 3.1.

回答1:

It's not quite true to say that no optimization is possible. I'll go through the first few lines to show where transformations are and are not allowed:

  %addtmp = fadd double %x, %x

This first line could safely be transformed to fmul double %x 2.0e+0, but that's not actually an optimization on most architectures (fadd is generally as fast or faster than fmul, and doesn't require producing the constant 2.0). Note that barring overflow, this operation is exact (like all scaling by powers of two).

  %addtmp1 = fadd double %addtmp, %x

This line could be transformed to fmul double %x 3.0e+0. Why is this a legal transformation? Because the computation that produced %addtmp was exact, so only a single rounding is been incurred whether this is computed as x * 3 or x + x + x. Because these are IEEE-754 basic operations and therefore correctly rounded, the result is the same either way. What about overflow? Neither may overflow unless the other does as well.

  %addtmp2 = fadd double %addtmp1, %x

This is the first line that cannot be legally transformed into constant * x. 4 * x would compute exactly, without any rounding, whereas x + x + x + x incurs two roundings: x + x + x is rounded once, then adding x may round a second time.

  %addtmp3 = fadd double %addtmp2, %x

Ditto here; 5 * x would incur one rounding; x + x + x + x + x incurs three.

The only line that might be beneficially transformed would be replacing x + x + x with 3 * x. However, the subexpression x + x is already present elsewhere, so an optimizer easily could choose not to employ this transform (since it can take advantage of the existing partial result if it does not).

来源：https://stackoverflow.com/questions/11942527/why-dont-llvm-passes-optimize-floating-point-instructions

标签

floating-point

llvm

compiler-optimization