Haskell: Handling deadlocked self-referential lists

问题

Is there any useful reason why the GHC allows the following to block forever:

list = 1 : tail list

It seems with a bit of sophistication in the list iterator/generator we should be able to do something more useful:

Return error "Infinitely blocking list"
Return [1,1]

Explaining 2: it seems possible that when entering the generator to get element N, we could then make all self references inside the generator limited to the list but ending at N-1 (we notice the read N inside the scope generate N and return the end-of-list). It's a sort of simple deadlock detection using scopes.

Clearly this isn't that useful for the toy example above, but it may allow for more useful/elegant finite, self-referential list definitions, for example:

primes = filter (\x -> none ((==0).mod x) primes) [2..]

Note that either change should only affect list generators that would currently result in an infinite-block, so they seem backward compatible language changes.

Ignoring the GHC-complexity required to make such a change for a moment, would this behavior break any existing language behavior that I am missing? Any other thoughts on the "elegance" of this change?

Also see another BFS example that could benefit below. To me, this seems more functional/elegant than some other solutions, since I am only needing to define what a bfsList is, not how to generate it (i.e specifying a terminating condition):

bfs :: (a -> Bool) -> (a -> [a]) -> [a] -> Maybe a
bfs predf expandf xs = find predf bfsList
    where bfsList = xs ++ concatMap expandf bfsList

回答1:

Here is a denotational perspective on how list = 1 : ⊥.

First, a little background. In Haskell, values are partially ordered by "definedness", where values inolving &bot; ("bottom") are less-defined than ones without. So

⊥ is less defined than 1 : ⊥
1 : ⊥ is less defined than 1 : 2 : 3 : []

But it's a partial order, so

1 : ⊥ is not less defined than 2 : 3 : ⊥, nor is it more defined.

even though the second list is longer. 1 : ⊥ is only less defined than lists that start with 1. I highly recommend reading about denotational semantics of Haskell.

Now to your question. Look at

list = 1 : tail list

as an equation to be solved instead of a "function declaration". We rewrite it like this:

list = ((1 :) . tail) list

Viewing it this way, we see that list is a fixed point

list = f list

where f = (1 :) . tail. In Haskell semantics, recursive values are solved by finding the least fixed point according to the above ordering.

The way to find this is very simple. If you start with ⊥, and then apply the function over and over, and you will find an increasing chain of values. The point at which the chain stops changing will be the least fixed point (technically the it will be the limit of the chain, since it might not ever stop changing).

Starting with ⊥,

f ⊥ = ((1 :) . tail) ⊥ = 1 : tail ⊥

we see that ⊥ is not already a fixed point because we didn't get ⊥ out the other end. So let's try again with what we got out:

f (1 : tail ⊥) = ((1 :) . tail) (1 : tail ⊥)
               = 1 : tail (1 : tail ⊥)
               = 1 : tail ⊥

Oh look, it's a fixed point, we got the same thing out that we put in.

The important point here is that it's the least one. Your solution [1,1] = 1:1:[] is also a fixed point, so it solves the equation:

f (1:1:[]) = ((1 :) . tail) (1:1:[]) 
           = 1 : tail (1:1:[])
           = 1:1:[]

But of course, every list that starts with 1 is a solution, and it's unclear how we should choose between them. However, the one we found by recursion 1:⊥ is less defined than all of them, it delivers no more information than required by the equation, and that is the one that is specified by the language.

回答2:

Even though list loops forever under GHCi, a proper binary compiled with GHC does detect the loop and signals an error. If you compile and run:

list = 1 : tail list
main = print list

it terminates with the error message:

Loop: <<loop>>

It does the same thing with your primes examples.

As others have noted, GHC doesn't detect all possible loops. It if did, then it would solve the Halting Problem, and that would probably make Haskell much more popular.

The reason it returns an error (or "gets stuck") instead of returning [1,1] is because the expression:

list = 1 : tail list

has well defined semantics in the Haskell language. These semantics assign it a value, and this value is "bottom" (or "error" or the symbol _|_), just as surely as the value of head [1,2,3] is 1.

(Well, technically, the value of list is 1 : _|_ which is "almost bottom". This is what @Justin Li was talking about in his comment. I've tried to give an explanation of why it has this value below.)

Though you may not see the use of a program or an expression that returns bottom and not see the harm in assigning non-bottom semantics to such expressions on the basis that it is "backwards compatible", most people in the Haskell community (the language designers, compiler developers, and experienced users) will disagree with you, so don't expect to make much progress with them.

As for the specific new semantics you are proposing, they are unclear. Why isn't the value of list equal to [1]? It seems to me that when I am entering the "generator" to get element n=1 (zero indexed, so the second element) and evaluate tail list, then the list ending at element n-1=0 is [1] which has tail equal to [], so I think I should get the following, right?

list = 1 : tail list
     = 1 : tail [1]   -- use list-so-far
     = 1 : []
     = [1]

Why the value is (almost) bottom

Here's why the value of list is (almost) bottom, according to the semantics of standard Haskell (but see note at the end).

For reference, the definition of tail is, effectively:

tail l = case l of _:xs -> xs
                   [] -> error "ack, you dummy!"

Let's try to "fully" evaluate list using Haskell semantics:

-- evaluating `list` using definition of `list`
list = 1 : tail list

-- evaluating `tail list` using definition of `tail`
list = 1 : case list of _:xs -> xs
                        ...
-- evaluating case construct requires matching `list` to
-- a pattern, this requires evaluation of `list` using its defn
list = 1 : case (1 : tail list) of _:xs -> xs
                                   ...
-- case pattern match succeeds
list = 1 : let xs = tail list in xs    -- just to be clear
     = 1 : tail list

-- awesome, now all we need to do is evaluate:
list = 1 : tail list
-- ummm, Houston, we have a problem

and that infinite loop at the end is why the expression is "almost bottom".

Note: There are actually several different sets of Haskell semantics, different methods of calculating the values of Haskell expressions. The gold standard are the denotational semantics described in @luqui's answer. The ones I'm using above are, at best, a form of the "informal semantics" described in the Haskell report, but they're good enough to get the right answer.

来源：https://stackoverflow.com/questions/46478088/haskell-handling-deadlocked-self-referential-lists

标签

list

haskell

recursion

deadlock

self-reference