Is there a name for ADT with explicit subtyping?

问题

I'm looking for a proper name for a data type that combines ADT with explicit subtyping.

In one of my applications, I use a structure similar to ADT to represent parse trees, on which I perform recursive pattern matching. I find it rather convenient if I could combine ADT with subtyping, as demonstrated in the example below:

Note: the example is written in Haskell's syntax, but this is not Haskell code.

data Empty = Empty
data Expr = Int Int | Add Expr AddOp Expr
data OptionalExpr =
| Empty  // I want to make Empty a subtype of OptionalExpr
| Expr   // I want to make Expr  a subtype of OptionalExpr

In the example above, I first define 2 types: Empty and Expr. Then I make these 2 types the subtype of OptionalExpr. I realize this kind of data type is uncommon. Apparently neither Haskell nor OCaml support it. But I don't know about other functional languages.

I am looking for something that combines ADT with explicit subtyping, not structurally-implied subtyping as in polymorphic variant. There are a few justifications for this requirement:

First, we want all-or-none subtyping. Say we want A to be a subtype of B, then we will never want to include only some of the variants of A under B. Either A is a subtype of B, in which case B includes all the variants of A, or A is not a subtype of B, in which case B includes none of the variants of A. We don't allow gray-area in between.
Second, we don't want B to be open in any sense. We have in mind a very specific set of subtypes of B. We don't want something to become an instance of B just by implementing a typeclass or the like.
Third, say type A has a large number of variants. We want to make type B a supertype of A. Copying all the variants into B, as is required with polymorphic variant, is just too cumbersome and error-prone.
Fourth, we don't want to introduce new value-constructors when all we want to express is a subtype. In the example above, we could have written OptionalExpr as an ADT with 2 value-constructors, like this: data OptionalExpr = EmptyExpr | NonEmptyExpr Expr, or we could have used Maybe, but in my application this is unacceptable, because the level of embedding can be quite deep, and it would be a nightmare to deconstruct an deeply embedded value like (L1 (L2 (L3 (L4 (L5 value_wanted))))).

To give you some idea why such requirements exist, I show a more specific example below:

PrimaryExpr = ID | LeftParen Expr RightParen
UnaryExpr = PrimaryExpr | NegateOp PrimaryExpr    // -
MultExpr  = UnaryExpr | MultExpr MultOp UnaryExpr // *
AddExpr   = MultExpr  | AddExpr  AddOp MultExpr   // +
CompExpr  = AddExpr | AddExpr CompOp AddExpr
Expr = CompExpr

The above example expresses a subtype hierarchy, and expresses ideas such as AddExpr is a CompExpr, but a CompExpr is not an AddExpr. For this specific example, some people have suggested to me that I can replace UnaryExpr, MultExpr, AddExpr and so on with just Expr. That is, I can define all the types as a single type. That loses type constraints such as CompExpr is not AddExpr, and because I'm doing recursive pattern matching on these types, I need that constraints of this hierarchy to be statically enforced.

Is there a name for this kind of data type I'm looking for in the literature? Or am I looking for something that doesn't even make sense? If you think this is the case, why am I looking for something nonsensical? Thanks for any pointers.

EDIT: even though I've written the above code snippets in Haskell's syntax, I am not writing my application in Haskell. I'm using my own language and my own data types, so I am not limited by Haskell's semantics. I am looking for a pointer to similar concepts in literature, so that when I write a report for my project I don't appear to be reinventing something new. I tried all the google keywords I can think of and nothing right was returned so I'm asking here.

回答1:

In a comment, you say:

I'm not sure how to encode a subtype hierarchy using GADTs. If you think it is doable, would you mind providing an answer with an example as to how the type hierarchy given in my example may be encoded?

Therefore I give an answer to this question here. The key idea is to give a type-level function (in the host language, here Haskell) for computing the subtyping relation (of the target language's type system, here your custom EDSL). For simplicity, I will spell out the subtyping relation in full, but standard type-level programming can be used to reduce the repetition and raise the abstraction level as appropriate. First, the extensions needed:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}

Now the definition of the subtyping relation:

data Level = Primary | Unary | Mult | Add | Comp

type family Subtype a b where
    Subtype Primary a       = True
    Subtype Unary   Primary = False
    Subtype Unary   a       = True
    Subtype Mult    Primary = False
    Subtype Mult    Unary   = False
    Subtype Mult    a       = True
    Subtype Add     Add     = True
    Subtype Add     Comp    = True
    Subtype Add     a       = False
    Subtype Comp    Comp    = True
    Subtype Comp    a       = False

A closed type family is used to guarantee that the subtyping relation cannot be expanded by clients (your second property). Finally, the GADT for target language terms can use the subtyping relation as a constraint on its constructors.

data Expr a where
    ID      :: Subtype Primary a ~ True => Expr a
    Paren   :: Subtype Primary a ~ True => Expr b -> Expr a
    Negate  :: Subtype Unary   a ~ True => Expr Unary -> Expr a
    Times   :: Subtype Add     a ~ True => Expr Mult -> Expr Mult -> Expr a
    Plus    :: Subtype Add     a ~ True => Expr Add -> Expr Add -> Expr a
    Compose :: Subtype Comp    a ~ True => Expr Comp -> Expr Comp -> Expr a

Note that because the argument to Paren is polymorphic, you will need a type annotation on the contained term to express which "level" of the subtyping hierarchy you want that term to be treated as. I would expect you would need to do this in whatever language you are designing as well. In ghci, we can ask for the type of a sample term:

:t Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add)))
Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add)))
  :: (Subtype 'Comp a ~ 'True) => Expr a

This is more or less the type you would expect for this term, I think. You can also see that the expression hierarchy is strictly enforced, though I dare say the error message is not 100% clear (since it is written in host language terms and not target language terms):

:t Negate (Plus ID ID)

<interactive>:1:9:
    Couldn't match type ‘'False’ with ‘'True’
    Expected type: 'True
      Actual type: Subtype 'Add 'Unary
    In the first argument of ‘Negate’, namely ‘(Plus ID ID)’
    In the expression: Negate (Plus ID ID)

回答2:

Haskell is particularly amenable to modeling your domain, perhaps because it can be described with a fairly simple mathematical model. Crucially, your first point implies that the subtype relation is a well-order. This makes your life very easy - this model would likely translate easily to any language whose type system is at least as strong as that of Haskell.

Start by defining a type (which will be lifted to a kind) to represent your variants:

data Variant = Primary | Unary | Mult | Add | Comp | Expr

Next a non-recursive datatype to represent the nodes in your term language:

data ExprF (k :: Variant -> *) (x :: Variant) where 
  ID_F :: ExprF k 'Primary 
  Paren_F :: k 'Expr -> ExprF k 'Primary  
  Negate_F :: k 'Primary -> ExprF k 'Unary
  Mult_F :: k 'Mult -> k 'Unary -> ExprF k 'Mult 
  Add_F  :: k 'Add -> k 'Mult -> ExprF k 'Add 
  Comp_F :: k 'Add -> k 'Add -> ExprF k 'Comp

Recursive occurrences of terms are represented by an additional parameter. Essentially this is just the typical polynomial functor representation (i.e. Fix) but with an index parameter.

Your expression type is then:

data Expr' (x :: Variant) where 
  Expr' :: (x <= y) => Expr x -> Expr' y 

data Expr (x :: Variant) where 
  MkExpr :: ExprF Expr' x -> Expr x

The <= class has not been introduced yet, but it represents your subtype relation.

As mentioned previously, your subtype relation is a well-order, and by this virtue each element in the ordering can be assigned a unique natural number such that the typical ordering on naturals respects your subtype relation. Or in other words, there is an injection f : Variant -> Nat such that x is a subtype of y iff f x <= f y (or strict subtype iff f x < f y - such a representation gives you a lot of generality).

The required injection is just given by your grammar. Note that each production is only a "subtype" (i.e. has a right-hand side which should not introduce a constructor) of of productions above it.

data Nat = Z | S Nat 
infixr 0 $ 
type ($) f a = f a 

type family VariantIx (x :: Variant) :: Nat where 
  VariantIx 'Primary = 'Z 
  VariantIx 'Unary = 'S 'Z 
  VariantIx 'Mult = 'S $ 'S 'Z 
  VariantIx 'Add = 'S $ 'S $ 'S 'Z 
  VariantIx 'Comp = 'S $ 'S $ 'S $ 'S 'Z 
  VariantIx 'Expr = 'S $ 'S $ 'S $ 'S $ 'S 'Z

You need an implicit subtype relation (which is <=) but it is often much easier to work with an explicit proof of the relation, so it is typical that the implicit version simply generates the explicit proof. To this end you write two declarations:

data family (:<=:) (x :: k) (y :: k) 
class (<=) (x :: k) (y :: k) where 
  isLTEQ :: x :<=: y

The instances for naturals should fairly obvious:

data instance (:<=:) (x :: Nat) y where 
  LT_Z :: 'Z :<=: n 
  LT_S :: n :<=: m -> 'S n :<=: 'S m 
instance 'Z <= n where isLTEQ = LT_Z 
instance (n <= m) => 'S n <= 'S m where isLTEQ = LT_S isLTEQ

and the instances for Variant define the order induced by VariantIx:

newtype instance (:<=:) (x :: Variant) y = IsSubtype (VariantIx x :<=: VariantIx y) 
instance (VariantIx x <= VariantIx y) => x <= y where isLTEQ = IsSubtype isLTEQ

You probably want some smart constructors. If you are using a recent GHC you will have access to pattern synonyms, but it isn't necessary:

id_ = MkExpr ID_F 
pattern Id = MkExpr ID_F 

pattern Paren e = MkExpr (Paren_F (Expr' e)) 

pattern Neg e = MkExpr (Negate_F (Expr' e)) 

infixl 6 :+ 
pattern (:+) a b = MkExpr (Add_F (Expr' a) (Expr' b)) 

infixl 7 :* 
pattern (:*) a b = MkExpr (Mult_F (Expr' a) (Expr' b)) 

pattern Cmp a b = MkExpr (Comp_F (Expr' a) (Expr' b))

and some simple examples:

>Id :+ Id :+ Neg Id :* Id
Add_F (Add_F ID_F ID_F) (Mult_F (Negate_F ID_F) ID_F)
>Id :+ Id :* Neg (Id :* Id)

<interactive>:6:13:
    No instance for (('S $ 'S 'Z) <= 'Z) arising from a use of `Neg'

Note that you could also write your expression type in a slightly different way:

data ExprFlip (x :: Variant) where 
  MkExprFlip :: (x <= y) => ExprF ExprFlip x -> ExprFlip y

This differs from the original in that the outermost type of an expression has the subtype relation applied to it - so e.g.

pattern Id' = MkExprFlip ID_F

has type ExprFlip t while Id :: Expr 'Primary. I can't see any other way in which they differ, and I imagine that it would simply be a matter of preference, or which use cases are most common. The original presentation has the advantage the the output type is always monomorphic, which may make type inference better in some cases, but does not affect the construction of expressions.

To address your four points:

This model relies on the semantics of the subtype relation by design.
VariantIx and the Variant type are closed. Any additional instances for :<=: or <= for Variant or Nat will overlap with the existing ones (which are as general as possible) so while in principle they can defined, attempting to use them will produce type errors.
Essentially you have a reflexive and transitive relation, and these properties are captured in the <= instance for Nat once and for all. Changing the subtype relation amounts only to changing Variant and VariantIx.
The proofs of the subtype relation are constructed by type inference - by the <= class. Since all of the indices in the ExprF datatype are monomorphic, the type checker will always be able to compute the subtype relation for the indices.

Full code:

{-# LANGUAGE StandaloneDeriving, UndecidableInstances, PatternSynonyms
  , TypeOperators, KindSignatures, PolyKinds, DataKinds, GADTs, TypeFamilies
  , MultiParamTypeClasses, FlexibleContexts, FlexibleInstances #-} 

data Variant = Primary | Unary | Mult | Add | Comp | Expr 

data ExprF (k :: Variant -> *) (x :: Variant) where 
  ID_F :: ExprF k 'Primary 
  Paren_F :: k 'Expr -> ExprF k 'Primary  
  Negate_F :: k 'Primary -> ExprF k 'Unary
  Mult_F :: k 'Mult -> k 'Unary -> ExprF k 'Mult 
  Add_F  :: k 'Add -> k 'Mult -> ExprF k 'Add 
  Comp_F :: k 'Add -> k 'Add -> ExprF k 'Comp 

data Expr' (x :: Variant) where 
  Expr' :: (x <= y) => Expr x -> Expr' y 

data Expr (x :: Variant) where 
  MkExpr :: ExprF Expr' x -> Expr x 

data ExprFlip (x :: Variant) where 
  MkExprFlip :: (x <= y) => ExprF ExprFlip x -> ExprFlip y 

pattern Id' = MkExprFlip ID_F 

data Nat = Z | S Nat 
infixr 0 $ 
type ($) f a = f a 

type family VariantIx (x :: Variant) :: Nat where 
  VariantIx 'Primary = 'Z 
  VariantIx 'Unary = 'S 'Z 
  VariantIx 'Mult = 'S $ 'S 'Z 
  VariantIx 'Add = 'S $ 'S $ 'S 'Z 
  VariantIx 'Comp = 'S $ 'S $ 'S $ 'S 'Z 
  VariantIx 'Expr = 'S $ 'S $ 'S $ 'S $ 'S 'Z 

data family (:<=:) (x :: k) (y :: k) 
class (<=) (x :: k) (y :: k) where 
  isLTEQ :: x :<=: y 

data instance (:<=:) (x :: Nat) y where 
  LT_Z :: 'Z :<=: n 
  LT_S :: n :<=: m -> 'S n :<=: 'S m 
instance 'Z <= n where isLTEQ = LT_Z 
instance (n <= m) => 'S n <= 'S m where isLTEQ = LT_S isLTEQ 

newtype instance (:<=:) (x :: Variant) y = IsSubtype (VariantIx x :<=: VariantIx y) 
instance (VariantIx x <= VariantIx y) => x <= y where isLTEQ = IsSubtype isLTEQ 

id_ = MkExpr ID_F 

pattern Id = MkExpr ID_F 

pattern Paren e = MkExpr (Paren_F (Expr' e)) 

pattern Neg e = MkExpr (Negate_F (Expr' e)) 

infixl 6 :+ 
pattern (:+) a b = MkExpr (Add_F (Expr' a) (Expr' b)) 

infixl 7 :* 
pattern (:*) a b = MkExpr (Mult_F (Expr' a) (Expr' b)) 

pattern Cmp a b = MkExpr (Comp_F (Expr' a) (Expr' b)) 

instance Show (Expr' x) where
  showsPrec k (Expr' x) = showsPrec k x  

instance Show (Expr x) where 
  showsPrec k (MkExpr x) = showsPrec k x 

deriving instance (Show (k 'Mult), Show (k 'Add), Show (k 'Expr), Show (k 'Primary), Show (k 'Unary)) => Show (ExprF k x)

回答3:

There are two things that come to my mind, both for "real" subtyping systems (so not available in Haskell), although I am not completely sure whether any of them fits all of your requirements:

Explicit untagged union types, as in Ceylon, which let you name the type A | B, which is a supertype of both A and B. Thus, you could just make Empty and Expr normal ADTs, and then declare a synonym type OptionalExpr = Empty | Expr.
The way ADTs are modelled in Scala, as hierarchies of sealed traits and case classes:
```
sealed trait OptionalExpr
case object Empty extends OptionalExpr
sealed trait Expr extends OptionalExpr
case class IntExpr(i: Int) extends OptionaExpr
case class AddExpr(lhs: Expr, op: AddOp, rhs: Expr) extends OptionalExpr
```
This way, OptionalExpr and Expr are not extensible (since the traits are sealed) and behave mostly like ADTs in Haskell, but you can still access the "intermediate" types like in a normal inheritance hierarchy (unlike in Haskell, where you have only the constructors, which are not by themselves types).

Both cases require a form of pattern matching to access values, of course, since you have to recover in which "part of the union" you are.

回答4:

Unless I misunderstand, polymorphic variants can do pretty much exactly this. However, "untagged union" isn't a great term to use (I imagine most people would think you were asking for C-style unions).

The example would look like this:

type empty = [`Empty]

type bin_op = Add | Sub

type expr = [`Int of int | `Add of expr * bin_op * expr]

type optional_expr = [empty | expr]

type weird_expr = [expr | `Wierd of expr | `Zonk of string]

Note that with OCaml's polymorphic variants the subtype relationship is defined structurally and not between named types.

来源：https://stackoverflow.com/questions/36672346/is-there-a-name-for-adt-with-explicit-subtyping

标签

haskell

functional-programming

ocaml