Compute fast log base 2 ceiling

前端未结

关注

 14  689

栀梦 2020-11-28 11:35

What is a fast way to compute the (long int) ceiling(log_2(i)), where the input and output are 64-bit integers? Solutions for signed or unsigned integers are ac

14条回答

隐瞒了意图╮ (楼主)

2020-11-28 12:09
The fastest approach I'm aware of uses a fast log2 that rounds down, combined unconditional adjustment of input value before and after to handle the rounding up case as in lg_down() shown below.
```
/* base-2 logarithm, rounding down */
static inline uint64_t lg_down(uint64_t x) {
  return 63U - __builtin_clzl(x);
}

/* base-2 logarithm, rounding up */
static inline uint64_t lg_up(uint64_t x) {
  return lg_down(x - 1) + 1;
}
```
Basically adding 1 to the rounded-down result is already correct for all values except exact powers of two (since in that case the floor and ceil approaches should return the same answer), so it is sufficient to subtract 1 from the input value to handle that case (it doesn't change the answer for the other cases) and add one to the result.

This is usually slightly faster than the approaches that adjust the value by explicitly checking for exact powers of two (e.g., adding a !!(x & (x - 1)) term). It avoids any comparisons and conditional operations or branches, is more likely to simply when inlining, is more amenable to vectorization, etc.

This relies on the "count leading bits" functionality offered by most CPUs using the clang/icc/gcc builtin __builtin_clzl, but other platforms offer something similar (e.g., the BitScanReverse intrinsic in Visual Studio).

Unfortunately, this many return the wrong answer for log(1), since that leads to __builtin_clzl(0) which is undefined behavior based on the gcc documentation. Of course, the general "count leading zeros" function has perfectly defined behavior at zero, but the gcc builtin is defined in this way because prior to the BMI ISA extension on x86, it would have been using the bsr instruction which itself has undefined behavior.

You could work around this if you know you have the lzcnt instruction by using the lzcnt intrinsic directly. Most platforms other than x86 never went through the bsr mistake in the first place, and probably also offer methods to access their "count leading zeros" instruction if they have one.
0 讨论(0)

查看其它14个回答
发布评论:

提交评论
- 加载中...