Hamming numbers and double precision

问题

I was playing around with generating Hamming numbers in Haskell, trying to improve on the obvious (pardon the naming of the functions)

mergeUniq :: Ord a => [a] -> [a] -> [a]
mergeUniq (x:xs) (y:ys) = case x `compare` y of
                               EQ -> x : mergeUniq xs ys
                               LT -> x : mergeUniq xs (y:ys)
                               GT -> y : mergeUniq (x:xs) ys

powers :: [Integer]
powers = 1 : expand 2 `mergeUniq` expand 3 `mergeUniq` expand 5
  where
    expand factor = (factor *) <$> powers

I noticed that I can avoid the (slower) arbitrary precision Integer if I represent the numbers as the triple of the 2-, 3- and 5-exponents like data Power = Power { k2 :: !Int, k3 :: !Int, k5 :: !Int }, where the number is understood to be 2^k2 * 3^k3 * 5^k5. The comparison of two Powers then becomes

instance Ord Power where
  p1 `compare` p2 = toComp (p1 `divP` gcdP) `compare` toComp (p2 `divP` gcdP)
    where
    divP p1 p2 = Power { k2 = k2 p1 - k2 p2, k3 = k3 p1 - k3 p2, k5 = k5 p1 - k5 p2 }
    gcdP = Power { k2 = min (k2 p1) (k2 p2), k3 = min (k3 p1) (k3 p2), k5 = min (k5 p1) (k5 p2) }
    toComp Power { .. } = fromIntegral k2 * log 2 + fromIntegral k3 * log 3 + fromIntegral k5 * log 5

So, very roughly speaking, to compare p₁ = 2^i₁ * 3^j₁ * 5^k₁ and p₂ = 2^i₂ * 3^j₂ * 5^k₂ we compare the logarithms of p₁ and p₂, which presumably fit Double. But actually we do even better: we first compute their GCD (by finding the mins of the corresponding exponents pairs — only Int arithmetic so far!), divide p₁ and p₂ by the GCD (by subtracting the mins from the corresponding exponents — also only Int arithmetic), and compare the logarithms of the results.

But, given that we go through Doubles, there will be loss of precision eventually. And this is the ground for my questions:

When will the finite precision of Doubles bite me? That is, how to estimate the order of i, j, k for which the results of comparisons of 2ⁱ * 3^j * 5^k with numbers with "similar" exponents will become unreliable?
How does the fact that we go through dividing by the GCD (which presumably lowers the exponents considerably for this task) modify the answer to the previous question?

I did an experiment, comparing the numbers produced this way with the numbers produced via going through arbitrary precision arithmetic, and all Hamming numbers up to the 1'000'000'000th match exactly (which took me about 15 minutes and 600 megs of RAM to verify). But that's obviously not a proof.

回答1:

Empirically, it's above about 10 trillionths Hamming number, or higher.

Using your nice GCD trick won't help us here, because some neighboring Hamming numbers are bound to have no common factors between them.

update: trying it online on ideone and elsewhere, we get

4T  5.81s 22.2MB  -- 16 digits used.... still good
                  --  (as evidenced by the `True` below), but really pushing it.
((True,44531.6794,7.275957614183426e-11),(16348,16503,873),"2.3509E+13405")
-- isTruly  max        min logval           nth-Hamming       approx.
--  Sorted   logval      difference          as i,j,k          value
--            in band      in band                             in decimal
10T   11.13s 26.4MB
((True,60439.6639,7.275957614183426e-11),(18187,23771,1971),"1.4182E+18194")
13T   14.44s 30.4MB    ...still good
((True,65963.6432,5.820766091346741e-11),(28648,21308,1526),"1.0845E+19857")

---- same code on tio:
10T   16.77s
35T   38.84s 
((True,91766.4800,5.820766091346741e-11),(13824,2133,32112),"2.9045E+27624")
70T   59.57s
((True,115619.1575,5.820766091346741e-11),(13125,13687,34799),"6.8310E+34804")

---- on home machine:
100T: 368.13s
((True,130216.1408,5.820766091346741e-11),(88324,876,17444),"9.2111E+39198")

140T: 466.69s
((True,145671.6480,5.820766091346741e-11),(9918,24002,42082),"3.4322E+43851")

170T: 383.26s         ---FAULTY---
((False,155411.2501,0.0),(77201,27980,14584),"2.80508E+46783")

回答2:

I guess that you could use adaptive arbitrary precision to compute the log.

If you choose log base 2, then log2(2^i) is trivial. That eliminates 1 factor and log2 has the advantage of being easier to compute than natural logarithm (https://en.wikipedia.org/wiki/Binary_logarithm gives an algorithm for example, there is also Shanks...).

For log2(3) and log2(5), you would develop just enough terms to distinguish both operands. I don't know if it would lead to more operations than directly exponentiating 3^j and 5^k in large integer arithmetic and counting high bit... But those could be pre-tabulated up to required number of digits.

来源：https://stackoverflow.com/questions/60803224/hamming-numbers-and-double-precision

标签

algorithm

haskell

floating-point

precision

hamming-numbers