Special runtime representation of [] type?

问题

Consider the simple definition of a length-indexed vector:

data Nat = Z | S Nat 

infixr 5 :> 
data Vec (n :: Nat) a where 
  V0 :: Vec 'Z a 
  (:>) :: a -> Vec n a -> Vec ('S n) a

Naturally I would at some point need the following function:

vec2list :: Vec n a -> [a]

However, this function is really just a fancy identity. I believe that the runtime representations of these two types are the same, so

vec2list :: Vec n a -> [a]  
vec2list = unsafeCoerce

should work. Alas, it does not:

>vec2list ('a' :> 'b' :> 'c' :> V0)
""

Every input simply returns the empty list. So I assume my understand is lacking. To test it, I define the following:

data List a = Nil | Cons a (List a) deriving (Show) 

vec2list' :: Vec n a -> List a 
vec2list' = unsafeCoerce 

test1 = vec2list' ('a' :> 'b' :> 'c' :> V0)

data SomeVec a = forall n . SomeVec (Vec n a) 

list'2vec :: List a -> SomeVec a 
list'2vec x = SomeVec (unsafeCoerce x)

Surprisingly this works! It certainly isn't an issue with the GADT then (my initial thought).

I think that the List type is really identical at runtime to []. I try to test this too:

list2list :: [a] -> List a 
list2list = unsafeCoerce 

test2 = list2list "abc"

and it works! Based on this fact, I have to conclude that [a] and List a must have the same runtime representation. And yet, the following

list2list' :: List a -> [a] 
list2list' = unsafeCoerce 

test3 = list2list' (Cons 'a' (Cons 'b' (Cons 'c' Nil)))

does not work. list2list' again always returns the empty list. I believe that "having identical runtime representations" must be a symmetric relation, so this doesn't seem to make sense.

I began to think maybe there's something funny with "primitive" types - but I always believed that [] is only special syntactically, not semantically. It seems that's the case:

data Pair a b = Pair a b deriving (Show, Eq, Ord)  

tup2pair :: (a,b) -> Pair a b 
tup2pair = unsafeCoerce 

pair2tup :: Pair a b -> (a,b) 
pair2tup = unsafeCoerce

The first function works and the second does not - same as the in the case of List and []. Although in this case, pair2tup segfaults as opposed to always returning the empty list.

It seems to be consistently asymmetric with respect to types which use "built-in" syntax. Back to the Vec example, the following

list2vec :: [a] -> SomeVec a 
list2vec x = SomeVec (unsafeCoerce x)

works just fine as well! The GADT really isn't special.

The question is: how do the runtime representations of types which use "built-in" syntax differ from those that do not?

Alternatively, how does one write a zero-cost coercion from Vec n a to [a]? This doesn't answer the question but solves the problem.

Testing was done with GHC 7.10.3.

As noted by a commenter, this behaviour is only present when interpreting. When compiled, all functions work as expected. The question still applies, just to runtime representation when interpreting.

回答1:

Now to answer your main question, this thread appears to have the answer: start ghci with -fobject-code:

$ ghci /tmp/vec.hs
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Main             ( /tmp/vec.hs, interpreted )
Ok, modules loaded: Main.
*Main> print $ vec2list ('a' :> 'b' :> 'c' :> V0)
""

With -fobject-code:

$ ghci -fobject-code /tmp/vec.hs
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Main             ( /tmp/vec.hs, /tmp/vec.o )
Ok, modules loaded: Main.
Prelude Main> print $ vec2list ('a' :> 'b' :> 'c' :> V0)
"abc"

The modules that contain [] and (,) are all compiled, which causes their runtime representation to be different from isomorphic datatypes in interpreted modules. According to Simon Marlow on the thread I linked, interpreted modules add annotations for the debugger. I think this also explains why tup2pair works and pair2tup doesn't: missing annotations isn't a problem for interpreted modules, but the compiled modules choke on the extra annotations.

-fobject-code has some downsides: longer compilation time, only brings exported functions in scope, but it has the additional advantage that running the code is much faster.

回答2:

To answer only your alternative question, you could create a newtype with a non-exported constructor to give a list a type-level length and a zero-cost coercion to lists:

{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE DataKinds #-}

module Vec (Nat(..), Vec, v0, (>:>), vec2list) where

data Nat = Z | S Nat

newtype Vec (n :: Nat) a = Vec { unVec :: [a] }

v0 :: Vec Z a
v0 = Vec []

infixr 5 >:>
(>:>) :: a -> Vec n a -> Vec ('S n) a
a >:> (Vec as) = Vec (a : as)

vec2list :: Vec n a -> [a]
vec2list (Vec as) = as

As long as the Vec constructor is not in scope (so only v0 and >:> can be used to construct vectors) the invariant that the type-level number represents the length can't be violated.

(This approach definitely has my preference over unsafeCoerce, as anything with unsafeCoerce could break with every update of GHC or on different platforms.)

来源：https://stackoverflow.com/questions/36054141/special-runtime-representation-of-type

标签

haskell

ghc