How to compute the integer division, 264/n? Assuming:
unsigned long
is 64-bit- We use a 64-bit CPU
- 1 < n < 264
If we do 18446744073709551616ul / n
, we get warning: integer constant is too large for its type
at compile time. This is because we cannot express 264 in a 64-bit CPU. Another way is the following:
#define IS_POWER_OF_TWO(x) ((x & (x - 1)) == 0)
unsigned long q = 18446744073709551615ul / n;
if (IS_POWER_OF_TWO(n))
return q + 1;
else
return q;
Is there any faster (CPU cycle) or cleaner (coding) implementation?
phuclv's idea of using -n
is clever, but can be made much simpler. As unsigned longs we have -n = 264-n, then (-n)/n = 264/n - 1, and we can simply add back the 1.
unsigned long foo(unsigned long n) {
return (-n)/n + 1;
}
The generated code is just what you would expect (gcc 8.3 on x86-64 via godbolt):
mov rax, rdi
xor edx, edx
neg rax
div rdi
add rax, 1
ret
I've come up with another solution which was inspired by this question. From there we know that
(a1 + a2 + a3 + ... + an)/n =
(a1/n + a2/n + a3/n + ... + an/n) + (a1 % n + a2 % n + a3 % n + ... + an % n)/n
By choosing a1 = a2 = a3 = ... = an-1 = 1 and an = 264 - n we'll have
(a1 + a2 + a3 + ... + an)/n = (1 + 1 + 1 + ... + (264 - n))/n = 264/n
= [(n - 1)*1/n + (264 - n)/n] + [(n - 1)*0 + (264 - n) % n]/n
= (264 - n)/n + ((264 - n) % n)/n
264 - n is the 2's complement of n, which is -n
, or we can also write it as ~0 - n + 1
. So the final solution would be
uint64_t twoPow64div(uint64_t n)
{
return (-n)/n + (n + (-n) % n)/n + (n > 1ULL << 63);
}
The last part is to correct the result, because we deal with unsigned integers instead of signed ones like in the other question. Checked on my PC on the 32 and 64-bit version and the result matches with your solution
On MSVC however there's an intrinsic for 128-bit division, so you can use like this
uint64_t remainder;
return _udiv128(1, 0, n, &remainder);
which results in the cleanest output
mov edx, 1
xor eax, eax
div rcx
ret 0
Here's the demo
On most x86 compilers long double
also has 64 bits of precision, so you can use either of these
(uint64_t)(powl(2, 64)/n)
(uint64_t)(((long double)~0ULL + 1)/n)
(uint64_t)(18446744073709551616.0L/n)
although probably the performance would be worse. This can also be applied to any implementations where long double
has more than 63 bits of significand, like PowerPC or Sparc
There's a related question about calculating ((UINT_MAX + 1)/x)*x - 1
: Integer arithmetic: Add 1 to UINT_MAX and divide by n without overflow with also clever solutions. Based on that we have
264/n = (264 - n + n)/n = (264 - n)/n + 1 = (-n)/n + 1
which is essentially just another way to get Nate Eldredge's answer
Here's some demo for other compilers on godbolt
See also:
We use a 64-bit CPU
Which 64-bit CPU?
In general, if you multiply a number with N bits by another number that has M bits, the result will have up to N+M bits. For integer division it's similar - if a number with N bits is divided by a number with M bits the result will have N-M+1 bits.
Because multiplication is naturally "widening" (the result has more digits than either of the source numbers) and integer division is naturally "narrowing" (the result has less digits); some CPUs support "widening multiplication" and "narrowing division".
In other words, some 64-bit CPUs support dividing a 128-bit number by a 64-bit number to get a 64-bit result. For example, on 80x86 it's a single DIV
instruction.
Unfortunately, C doesn't support "widening multiplication" or "narrowing division". It only supports "result is same size as source operands".
Ironically (for unsigned 64-bit divisors on 64-bit 80x86) there is no other choice and the compiler must use the DIV
instruction that will divide a 128-bit number by a 64-bit number. This means that the C language forces you to use a 64-bit numerator, then the code generated by the compiler extends your 64 bit numerator to 128 bits and divides it by a 64 bit number to get a 64 bit result; and then you write extra code to work around the fact that the language prevented you from using a 128-bit numerator to begin with.
Hopefully you can see how this situation might be considered "less than ideal".
What I'd want is a way to trick the compiler into supporting "narrowing division". For example, maybe by abusing casts and hoping that the optimiser is smart enough, like this:
__uint128_t numerator = (__uint128_t)1 << 64;
if(n > 1) {
return (uint64_t)(numerator/n);
}
I tested this for the latest versions of GCC, CLANG and ICC (using https://godbolt.org/ ) and found that (for 64-bit 80x86) none of the compilers are smart enough to realise that a single DIV
instruction is all that is needed (they all generated code that does a call __udivti3
, which is an expensive function to get a 128 bit result). The compilers will only use DIV
when the (128-bit) numerator is 64 bits (and it will be preceded by an XOR RDX,RDX
to set the highest half of the 128-bit numerator to zeros).
In other words, it's likely that the only way to get ideal code (the DIV
instruction by itself on 64-bit 80x86) is to resort to inline assembly.
For example, the best code you'll get without inline assembly (from Nate Eldredge's answer) will be:
mov rax, rdi
xor edx, edx
neg rax
div rdi
add rax, 1
ret
...and the best code that's possible is:
mov edx, 1
xor rax, rax
div rdi
ret
Your way is pretty good. It might be better to write it like this:
return 18446744073709551615ul / n + ((n&(n-1)) ? 0:1);
The hope is to make sure the compiler notices that it can do a conditional move instead of a branch.
Compile and disassemble.
来源:https://stackoverflow.com/questions/55565537/how-to-compute-2%e2%81%b6%e2%81%b4-n-in-c