Is there a name for ADT with explicit subtyping?

天大地大妈咪最大 提交于 2019-12-04 10:39:59

In a comment, you say:

I'm not sure how to encode a subtype hierarchy using GADTs. If you think it is doable, would you mind providing an answer with an example as to how the type hierarchy given in my example may be encoded?

Therefore I give an answer to this question here. The key idea is to give a type-level function (in the host language, here Haskell) for computing the subtyping relation (of the target language's type system, here your custom EDSL). For simplicity, I will spell out the subtyping relation in full, but standard type-level programming can be used to reduce the repetition and raise the abstraction level as appropriate. First, the extensions needed:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}

Now the definition of the subtyping relation:

data Level = Primary | Unary | Mult | Add | Comp

type family Subtype a b where
    Subtype Primary a       = True
    Subtype Unary   Primary = False
    Subtype Unary   a       = True
    Subtype Mult    Primary = False
    Subtype Mult    Unary   = False
    Subtype Mult    a       = True
    Subtype Add     Add     = True
    Subtype Add     Comp    = True
    Subtype Add     a       = False
    Subtype Comp    Comp    = True
    Subtype Comp    a       = False

A closed type family is used to guarantee that the subtyping relation cannot be expanded by clients (your second property). Finally, the GADT for target language terms can use the subtyping relation as a constraint on its constructors.

data Expr a where
    ID      :: Subtype Primary a ~ True => Expr a
    Paren   :: Subtype Primary a ~ True => Expr b -> Expr a
    Negate  :: Subtype Unary   a ~ True => Expr Unary -> Expr a
    Times   :: Subtype Add     a ~ True => Expr Mult -> Expr Mult -> Expr a
    Plus    :: Subtype Add     a ~ True => Expr Add -> Expr Add -> Expr a
    Compose :: Subtype Comp    a ~ True => Expr Comp -> Expr Comp -> Expr a

Note that because the argument to Paren is polymorphic, you will need a type annotation on the contained term to express which "level" of the subtyping hierarchy you want that term to be treated as. I would expect you would need to do this in whatever language you are designing as well. In ghci, we can ask for the type of a sample term:

:t Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add)))
Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add)))
  :: (Subtype 'Comp a ~ 'True) => Expr a

This is more or less the type you would expect for this term, I think. You can also see that the expression hierarchy is strictly enforced, though I dare say the error message is not 100% clear (since it is written in host language terms and not target language terms):

:t Negate (Plus ID ID)

<interactive>:1:9:
    Couldn't match type ‘'False’ with ‘'True’
    Expected type: 'True
      Actual type: Subtype 'Add 'Unary
    In the first argument of ‘Negate’, namely ‘(Plus ID ID)’
    In the expression: Negate (Plus ID ID)

Haskell is particularly amenable to modeling your domain, perhaps because it can be described with a fairly simple mathematical model. Crucially, your first point implies that the subtype relation is a well-order. This makes your life very easy - this model would likely translate easily to any language whose type system is at least as strong as that of Haskell.

Start by defining a type (which will be lifted to a kind) to represent your variants:

data Variant = Primary | Unary | Mult | Add | Comp | Expr 

Next a non-recursive datatype to represent the nodes in your term language:

data ExprF (k :: Variant -> *) (x :: Variant) where 
  ID_F :: ExprF k 'Primary 
  Paren_F :: k 'Expr -> ExprF k 'Primary  
  Negate_F :: k 'Primary -> ExprF k 'Unary
  Mult_F :: k 'Mult -> k 'Unary -> ExprF k 'Mult 
  Add_F  :: k 'Add -> k 'Mult -> ExprF k 'Add 
  Comp_F :: k 'Add -> k 'Add -> ExprF k 'Comp 

Recursive occurrences of terms are represented by an additional parameter. Essentially this is just the typical polynomial functor representation (i.e. Fix) but with an index parameter.

Your expression type is then:

data Expr' (x :: Variant) where 
  Expr' :: (x <= y) => Expr x -> Expr' y 

data Expr (x :: Variant) where 
  MkExpr :: ExprF Expr' x -> Expr x 

The <= class has not been introduced yet, but it represents your subtype relation.

As mentioned previously, your subtype relation is a well-order, and by this virtue each element in the ordering can be assigned a unique natural number such that the typical ordering on naturals respects your subtype relation. Or in other words, there is an injection f : Variant -> Nat such that x is a subtype of y iff f x <= f y (or strict subtype iff f x < f y - such a representation gives you a lot of generality).

The required injection is just given by your grammar. Note that each production is only a "subtype" (i.e. has a right-hand side which should not introduce a constructor) of of productions above it.

data Nat = Z | S Nat 
infixr 0 $ 
type ($) f a = f a 

type family VariantIx (x :: Variant) :: Nat where 
  VariantIx 'Primary = 'Z 
  VariantIx 'Unary = 'S 'Z 
  VariantIx 'Mult = 'S $ 'S 'Z 
  VariantIx 'Add = 'S $ 'S $ 'S 'Z 
  VariantIx 'Comp = 'S $ 'S $ 'S $ 'S 'Z 
  VariantIx 'Expr = 'S $ 'S $ 'S $ 'S $ 'S 'Z 

You need an implicit subtype relation (which is <=) but it is often much easier to work with an explicit proof of the relation, so it is typical that the implicit version simply generates the explicit proof. To this end you write two declarations:

data family (:<=:) (x :: k) (y :: k) 
class (<=) (x :: k) (y :: k) where 
  isLTEQ :: x :<=: y 

The instances for naturals should fairly obvious:

data instance (:<=:) (x :: Nat) y where 
  LT_Z :: 'Z :<=: n 
  LT_S :: n :<=: m -> 'S n :<=: 'S m 
instance 'Z <= n where isLTEQ = LT_Z 
instance (n <= m) => 'S n <= 'S m where isLTEQ = LT_S isLTEQ 

and the instances for Variant define the order induced by VariantIx:

newtype instance (:<=:) (x :: Variant) y = IsSubtype (VariantIx x :<=: VariantIx y) 
instance (VariantIx x <= VariantIx y) => x <= y where isLTEQ = IsSubtype isLTEQ 

You probably want some smart constructors. If you are using a recent GHC you will have access to pattern synonyms, but it isn't necessary:

id_ = MkExpr ID_F 
pattern Id = MkExpr ID_F 

pattern Paren e = MkExpr (Paren_F (Expr' e)) 

pattern Neg e = MkExpr (Negate_F (Expr' e)) 

infixl 6 :+ 
pattern (:+) a b = MkExpr (Add_F (Expr' a) (Expr' b)) 

infixl 7 :* 
pattern (:*) a b = MkExpr (Mult_F (Expr' a) (Expr' b)) 

pattern Cmp a b = MkExpr (Comp_F (Expr' a) (Expr' b)) 

and some simple examples:

>Id :+ Id :+ Neg Id :* Id
Add_F (Add_F ID_F ID_F) (Mult_F (Negate_F ID_F) ID_F)
>Id :+ Id :* Neg (Id :* Id)

<interactive>:6:13:
    No instance for (('S $ 'S 'Z) <= 'Z) arising from a use of `Neg'

Note that you could also write your expression type in a slightly different way:

data ExprFlip (x :: Variant) where 
  MkExprFlip :: (x <= y) => ExprF ExprFlip x -> ExprFlip y 

This differs from the original in that the outermost type of an expression has the subtype relation applied to it - so e.g.

pattern Id' = MkExprFlip ID_F 

has type ExprFlip t while Id :: Expr 'Primary. I can't see any other way in which they differ, and I imagine that it would simply be a matter of preference, or which use cases are most common. The original presentation has the advantage the the output type is always monomorphic, which may make type inference better in some cases, but does not affect the construction of expressions.


To address your four points:

  1. This model relies on the semantics of the subtype relation by design.
  2. VariantIx and the Variant type are closed. Any additional instances for :<=: or <= for Variant or Nat will overlap with the existing ones (which are as general as possible) so while in principle they can defined, attempting to use them will produce type errors.
  3. Essentially you have a reflexive and transitive relation, and these properties are captured in the <= instance for Nat once and for all. Changing the subtype relation amounts only to changing Variant and VariantIx.
  4. The proofs of the subtype relation are constructed by type inference - by the <= class. Since all of the indices in the ExprF datatype are monomorphic, the type checker will always be able to compute the subtype relation for the indices.

Full code:

{-# LANGUAGE StandaloneDeriving, UndecidableInstances, PatternSynonyms
  , TypeOperators, KindSignatures, PolyKinds, DataKinds, GADTs, TypeFamilies
  , MultiParamTypeClasses, FlexibleContexts, FlexibleInstances #-} 

data Variant = Primary | Unary | Mult | Add | Comp | Expr 

data ExprF (k :: Variant -> *) (x :: Variant) where 
  ID_F :: ExprF k 'Primary 
  Paren_F :: k 'Expr -> ExprF k 'Primary  
  Negate_F :: k 'Primary -> ExprF k 'Unary
  Mult_F :: k 'Mult -> k 'Unary -> ExprF k 'Mult 
  Add_F  :: k 'Add -> k 'Mult -> ExprF k 'Add 
  Comp_F :: k 'Add -> k 'Add -> ExprF k 'Comp 

data Expr' (x :: Variant) where 
  Expr' :: (x <= y) => Expr x -> Expr' y 

data Expr (x :: Variant) where 
  MkExpr :: ExprF Expr' x -> Expr x 

data ExprFlip (x :: Variant) where 
  MkExprFlip :: (x <= y) => ExprF ExprFlip x -> ExprFlip y 

pattern Id' = MkExprFlip ID_F 

data Nat = Z | S Nat 
infixr 0 $ 
type ($) f a = f a 

type family VariantIx (x :: Variant) :: Nat where 
  VariantIx 'Primary = 'Z 
  VariantIx 'Unary = 'S 'Z 
  VariantIx 'Mult = 'S $ 'S 'Z 
  VariantIx 'Add = 'S $ 'S $ 'S 'Z 
  VariantIx 'Comp = 'S $ 'S $ 'S $ 'S 'Z 
  VariantIx 'Expr = 'S $ 'S $ 'S $ 'S $ 'S 'Z 

data family (:<=:) (x :: k) (y :: k) 
class (<=) (x :: k) (y :: k) where 
  isLTEQ :: x :<=: y 

data instance (:<=:) (x :: Nat) y where 
  LT_Z :: 'Z :<=: n 
  LT_S :: n :<=: m -> 'S n :<=: 'S m 
instance 'Z <= n where isLTEQ = LT_Z 
instance (n <= m) => 'S n <= 'S m where isLTEQ = LT_S isLTEQ 

newtype instance (:<=:) (x :: Variant) y = IsSubtype (VariantIx x :<=: VariantIx y) 
instance (VariantIx x <= VariantIx y) => x <= y where isLTEQ = IsSubtype isLTEQ 

id_ = MkExpr ID_F 

pattern Id = MkExpr ID_F 

pattern Paren e = MkExpr (Paren_F (Expr' e)) 

pattern Neg e = MkExpr (Negate_F (Expr' e)) 

infixl 6 :+ 
pattern (:+) a b = MkExpr (Add_F (Expr' a) (Expr' b)) 

infixl 7 :* 
pattern (:*) a b = MkExpr (Mult_F (Expr' a) (Expr' b)) 

pattern Cmp a b = MkExpr (Comp_F (Expr' a) (Expr' b)) 

instance Show (Expr' x) where
  showsPrec k (Expr' x) = showsPrec k x  

instance Show (Expr x) where 
  showsPrec k (MkExpr x) = showsPrec k x 

deriving instance (Show (k 'Mult), Show (k 'Add), Show (k 'Expr), Show (k 'Primary), Show (k 'Unary)) => Show (ExprF k x) 

There are two things that come to my mind, both for "real" subtyping systems (so not available in Haskell), although I am not completely sure whether any of them fits all of your requirements:

  1. Explicit untagged union types, as in Ceylon, which let you name the type A | B, which is a supertype of both A and B. Thus, you could just make Empty and Expr normal ADTs, and then declare a synonym type OptionalExpr = Empty | Expr.

  2. The way ADTs are modelled in Scala, as hierarchies of sealed traits and case classes:

    sealed trait OptionalExpr
    case object Empty extends OptionalExpr
    sealed trait Expr extends OptionalExpr
    case class IntExpr(i: Int) extends OptionaExpr
    case class AddExpr(lhs: Expr, op: AddOp, rhs: Expr) extends OptionalExpr
    

    This way, OptionalExpr and Expr are not extensible (since the traits are sealed) and behave mostly like ADTs in Haskell, but you can still access the "intermediate" types like in a normal inheritance hierarchy (unlike in Haskell, where you have only the constructors, which are not by themselves types).

Both cases require a form of pattern matching to access values, of course, since you have to recover in which "part of the union" you are.

Unless I misunderstand, polymorphic variants can do pretty much exactly this. However, "untagged union" isn't a great term to use (I imagine most people would think you were asking for C-style unions).

The example would look like this:

type empty = [`Empty]

type bin_op = Add | Sub

type expr = [`Int of int | `Add of expr * bin_op * expr]

type optional_expr = [empty | expr]

type weird_expr = [expr | `Wierd of expr | `Zonk of string]

Note that with OCaml's polymorphic variants the subtype relationship is defined structurally and not between named types.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!