Number type boundaries in Common LISP and Stack flowing over in GHCI

问题

First question ever here, and newbie in both Common LISP and Haskell, please be kind. I have a function in Common LISP - code below - which is intended to tell whether the area of a triangle is an integral number (integer?).

(defun area-int-p (a b c)
  (let* ((s (/ (+ a b c) 2))
         (area (sqrt (* s (- s a) (- s b) (- s c)))))
    (if (equal (ceiling area) (floor area))
        t
        nil)))

This is supposed to use Heron's formula to calculate the area of the triangle, given the size of the three sides, and decide whether it is an integer comparing the ceiling and the floor. We are told that the area of an equilateral triangle is never an integer. Therefore, to test whether the function is working, I ran it with the arguments 333. Here is what I got in return:

CL-USER> (area-int-p 333 333 333)
NIL

Perfect! It works. To test it even more, I ran it with the arguments 3333. This is what I got in return:

CL-USER> (area-int-p 3333 3333 3333)
T

Something is wrong, this is not supposed to happen! So, I try the following, hopefully equivalent Haskell function to see what happens:

areaIntP :: (Integral a) => a -> a -> a -> Bool
areaIntP a b c =
  let aa = fromIntegral a
      bb = fromIntegral b
      cc = fromIntegral c
      perimeter = aa + bb + cc
      s = perimeter/2
      area = sqrt(s * (s - aa) * (s - bb) * (s - cc))
  in  if ceiling area == floor area
      then True
      else False

This is what I get:

*Main> areaIntP 3333 3333 3333
False
*Main> areaIntP 333 333 333
False

Looks perfect. Encouraged by this, I use the below functions in Haskell to sum the perimeters of of isosceles triangles with the third side differing just one unit from the other sides, an integral area, and perimeter below 1,000,000,000.

toplamArtilar :: Integral a => a -> a -> a -> a
toplamArtilar altSinir ustSinir toplam =
  if ustSinir == altSinir
  then toplam
  else if areaIntP ustSinir ustSinir (ustSinir + 1) == True
       then toplamArtilar altSinir (ustSinir - 1) (toplam + (3 * ustSinir + 1))
       else toplamArtilar altSinir (ustSinir - 1) toplam

toplamEksiler :: Integral a => a -> a -> a -> a
toplamEksiler altSinir ustSinir toplam =
  if ustSinir == altSinir
  then toplam
  else if areaIntP ustSinir ustSinir (ustSinir - 1) == True
       then toplamEksiler altSinir (ustSinir - 1) (toplam + (3 * ustSinir - 1))
       else toplamEksiler altSinir (ustSinir - 1) toplam

sonuc altSinir ustSinir =
  toplamEksiler altSinir ustSinir (toplamArtilar altSinir ustSinir 0)

(ustSinir means upper limit, altSinir lower limit by the way.) Running sonuc with the arguments 2 and 333333333 however, my stack flows over. Runnning the equivalent functions in Common LISP the stack is OK, but area-int-p function is not reliable, probably because of the boundaries of the number type the interpreter deduces. After all this, my question is two-fold:

1) How do I get round the problem in the Common LISP function area-int-p?

2) How do I prevent the stack overflow with the Haskell functions above, either within Emacs or with GHCi run from the terminal?

Note for those who figure out what I am trying to achieve here: please don't tell me to use Java BigDecimal and BigInteger.

Edit after very good replies: I asked two questions in one, and received perfectly satisfying, newbie friendly answers and a note on style from very helpful people. Thank you.

回答1:

Let's define an intermediate Common Lisp function:

(defun area (a b c)
  (let ((s (/ (+ a b c) 2)))
    (sqrt (* s (- s a) (- s b) (- s c)))))

Your tests give:

CL-USER> (area 333 333 333)
48016.344

CL-USER> (area 3333 3333 3333)
4810290.0

In the second case, it should be clear that both the ceiling and floor are equal. This is not the case in Haskell where the second test, with 3333, returns:

4810290.040910754

Floating point

In Common Lisp, the value from which we take a square root is:

370222244442963/16

This is because computations are made with rational numbers. Up to this point, the precision is maximal. However, SQRT is free to return either a rational, when possible, or an approximate result. As a special case, the result can be an integer on some implementations, as Rainer Joswig pointed out in a comment. It makes sense because both integer and ratio are disjoint subtypes of the rational type. But as your problem shows, some square roots are irrational (e.g. √2), and in that case CL can return a float approximating the value (or a complex float).

The relevant section regarding floats and mathematical functions is 12.1.3.3 Rule of Float Substitutability. Long story short, the result is converted to a single-float when you compute the square root, which happens to loose some precision. In order to have a double, you have to be more explicit:

(defun area (a b c)
   (let ((s (/ (+ a b c) 2)))
     (sqrt (float (* s (- s a) (- s b) (- s c)) 0d0))))

I could also have used (coerce ... 'double-float), but here I chose to call the FLOAT conversion function. The optional second argument is a float prototype, ie. a value of the target type. Above, it is 0d0, a double float. You could also use 0l0 for long doubles or 0s0 for short. This parameter is useful if you want to have the same precision as an input float, but can be used with literals too, like in the example. The exact meaning of short, single, double or long float types is implementation-defined, but they shall respect some rules. Current implementations generally give more precision that the minimum required.

CL-USER> (area 3333 3333 3333)
4810290.040910754d0

Now, if I wanted to test if the result is integral, I would truncate the float and look if the second returned value, the remainder, is zero.

CL-USER> (zerop (nth-value 1 (truncate 4810290.040910754d0)))
NIL

Arbitrary-precision

Note that regardless of the implementation language (Haskell, CL or another one) the approach is going to give incorrect results for some inputs, given how floats are represented. Indeed, the same problem you had with CL could arise for some inputs with more precise floats, where the result would be very close to an integer. You might need another mathematical approach or something like MPFR for arbitrary precision floating point computations. SBCL ships with sb-mpfr:

CL-USER> (require :sb-mpfr)
("SB-MPFR" "SB-GMP")

CL-USER> (in-package :sb-mpfr)
#<PACKAGE "SB-MPFR">

And then:

SB-MPFR> (with-precision 256
           (sqrt (coerce 370222244442963/16 'mpfr-float)))
.4810290040910754427104204965311207243133723228299086361205561385039201180068712e+7
-1

回答2:

I will answer your second question, I'm not sure about the first. In Haskell, because it's a lazy language, when you use tail recursion with an accumulator parameter, an "accumulation of thunks" can take place. A thunk is an expression that is suspended and not yet evaluated. To take a much simpler example, summing all the numbers from 0 to n:

tri :: Int -> Int -> Int
tri 0 accum = accum
tri n accum = tri (n-1) (accum + n)

If we trace the evaluation, we can see what's going on:

tri 3 0
  = tri (3-1) (0+3)
  = tri 2 (0+3)
  = tri (2-1) ((0+3)+2)
  = tri 1 ((0+3)+2)
  = tri (1-1) (((0+3)+2)+1)
  = tri 0 (((0+3)+2)+1)
  = ((0+3)+2)+1     -- here is where ghc uses the C stack
  = (0+3)+2        (+1) on stack
  = 0+3            (+2) (+1) on stack
  = 0              (+3) (+2) (+1) on stack
  = 3              (+2) (+1) on stack
  = 5              (+1) on stack
  = 6

This is a simplification of course, but it's an intuition that can help you understand both stack overflows and space leaks caused by thunk buildup. GHC only evalues a thunk when it's needed. We ask whether the value of n is 0 each time through tri so there is no thunk buildup in that parameter, but nobody needs to know the value of accum until the very end, which might be a really huge thunk as you can see from the example. In evaluating that huge thunk the stack can overflow.

The solution is to make tri evaluate accum sooner. This is usually done using a BangPattern (but can be done with seq if you don't like extensions).

{-# LANGUAGE BangPatterns #-}
tri :: Int -> Int -> Int
tri 0 !accum = accum
tri n !accum = tri (n-1) (accum + n)

The ! before accum means "evaluate this parameter at the moment of pattern matching" (even though the pattern doesn't technically need to know its value). Then we get this evaluation trace:

tri 3 0
  = tri (3-1) (0+3)
  = tri 2 3          -- we evaluate 0+3 because of the bang pattern
  = tri (2-1) (3+2)
  = tri 1 5
  = tri (1-1) (5+1)
  = tri 0 6
  = 6

I hope this helps.