How to compute 2⁶⁴/n in C?

How to compute the integer division, 2⁶⁴/n? Assuming:

unsigned long is 64-bit
We use a 64-bit CPU
1 < n < 2⁶⁴

If we do 18446744073709551616ul / n, we get warning: integer constant is too large for its type at compile time. This is because we cannot express 2⁶⁴ in a 64-bit CPU. Another way is the following:

#define IS_POWER_OF_TWO(x) ((x & (x - 1)) == 0)

unsigned long q = 18446744073709551615ul / n;
if (IS_POWER_OF_TWO(n))
    return q + 1;
else
    return q;

Is there any faster (CPU cycle) or cleaner (coding) implementation?

phuclv's idea of using -n is clever, but can be made much simpler. As unsigned longs we have -n = 2⁶⁴-n, then (-n)/n = 2⁶⁴/n - 1, and we can simply add back the 1.

unsigned long foo(unsigned long n) {
  return (-n)/n + 1;
}

The generated code is just what you would expect (gcc 8.3 on x86-64 via godbolt):

    mov     rax, rdi
    xor     edx, edx
    neg     rax
    div     rdi
    add     rax, 1
    ret

I've come up with another solution which was inspired by this question. From there we know that

(a₁ + a₂ + a₃ + ... + a_n)/n =

(a₁/n + a₂/n + a₃/n + ... + a_n/n) + (a₁ % n + a₂ % n + a₃ % n + ... + a_n % n)/n

By choosing a₁ = a₂ = a₃ = ... = a_n-1 = 1 and a_n = 2⁶⁴ - n we'll have

(a₁ + a₂ + a₃ + ... + a_n)/n = (1 + 1 + 1 + ... + (2⁶⁴ - n))/n = 2⁶⁴/n

= [(n - 1)*1/n + (2⁶⁴ - n)/n] + [(n - 1)*0 + (2⁶⁴ - n) % n]/n

= (2⁶⁴ - n)/n + ((2⁶⁴ - n) % n)/n

2⁶⁴ - n is the 2's complement of n, which is -n, or we can also write it as ~0 - n + 1. So the final solution would be

uint64_t twoPow64div(uint64_t n)
{
    return (-n)/n + (n + (-n) % n)/n + (n > 1ULL << 63);
}

The last part is to correct the result, because we deal with unsigned integers instead of signed ones like in the other question. Checked on my PC on the 32 and 64-bit version and the result matches with your solution

On MSVC however there's an intrinsic for 128-bit division, so you can use like this

uint64_t remainder;
return _udiv128(1, 0, n, &remainder);

which results in the cleanest output

    mov     edx, 1
    xor     eax, eax
    div     rcx
    ret     0

Here's the demo

On most x86 compilers long double also has 64 bits of precision, so you can use either of these

(uint64_t)(powl(2, 64)/n)
(uint64_t)(((long double)~0ULL + 1)/n)
(uint64_t)(18446744073709551616.0L/n)

although probably the performance would be worse. This can also be applied to any implementations where long double has more than 63 bits of significand, like PowerPC or Sparc

There's a related question about calculating ((UINT_MAX + 1)/x)*x - 1: Integer arithmetic: Add 1 to UINT_MAX and divide by n without overflow with also clever solutions. Based on that we have

2⁶⁴/n = (2⁶⁴ - n + n)/n = (2⁶⁴ - n)/n + 1 = (-n)/n + 1

which is essentially just another way to get Nate Eldredge's answer

Here's some demo for other compilers on godbolt

See also:

We use a 64-bit CPU

Which 64-bit CPU?

In general, if you multiply a number with N bits by another number that has M bits, the result will have up to N+M bits. For integer division it's similar - if a number with N bits is divided by a number with M bits the result will have N-M+1 bits.

Because multiplication is naturally "widening" (the result has more digits than either of the source numbers) and integer division is naturally "narrowing" (the result has less digits); some CPUs support "widening multiplication" and "narrowing division".

In other words, some 64-bit CPUs support dividing a 128-bit number by a 64-bit number to get a 64-bit result. For example, on 80x86 it's a single DIV instruction.

Unfortunately, C doesn't support "widening multiplication" or "narrowing division". It only supports "result is same size as source operands".

Ironically (for unsigned 64-bit divisors on 64-bit 80x86) there is no other choice and the compiler must use the DIV instruction that will divide a 128-bit number by a 64-bit number. This means that the C language forces you to use a 64-bit numerator, then the code generated by the compiler extends your 64 bit numerator to 128 bits and divides it by a 64 bit number to get a 64 bit result; and then you write extra code to work around the fact that the language prevented you from using a 128-bit numerator to begin with.

Hopefully you can see how this situation might be considered "less than ideal".

What I'd want is a way to trick the compiler into supporting "narrowing division". For example, maybe by abusing casts and hoping that the optimiser is smart enough, like this:

  __uint128_t numerator = (__uint128_t)1 << 64;
  if(n > 1) {
      return (uint64_t)(numerator/n);
  }

I tested this for the latest versions of GCC, CLANG and ICC (using https://godbolt.org/ ) and found that (for 64-bit 80x86) none of the compilers are smart enough to realise that a single DIV instruction is all that is needed (they all generated code that does a call __udivti3, which is an expensive function to get a 128 bit result). The compilers will only use DIV when the (128-bit) numerator is 64 bits (and it will be preceded by an XOR RDX,RDX to set the highest half of the 128-bit numerator to zeros).

In other words, it's likely that the only way to get ideal code (the DIV instruction by itself on 64-bit 80x86) is to resort to inline assembly.

For example, the best code you'll get without inline assembly (from Nate Eldredge's answer) will be:

    mov     rax, rdi
    xor     edx, edx
    neg     rax
    div     rdi
    add     rax, 1
    ret

...and the best code that's possible is:

    mov     edx, 1
    xor     rax, rax
    div     rdi
    ret

Your way is pretty good. It might be better to write it like this:

return 18446744073709551615ul / n + ((n&(n-1)) ? 0:1);

The hope is to make sure the compiler notices that it can do a conditional move instead of a branch.

Compile and disassemble.

来源：https://stackoverflow.com/questions/55565537/how-to-compute-2%e2%81%b6%e2%81%b4-n-in-c

标签

integer-division