问题
I'm looking for a proper name for a data type that combines ADT with explicit subtyping.
In one of my applications, I use a structure similar to ADT to represent parse trees, on which I perform recursive pattern matching. I find it rather convenient if I could combine ADT with subtyping, as demonstrated in the example below:
Note: the example is written in Haskell's syntax, but this is not Haskell code.
data Empty = Empty
data Expr = Int Int | Add Expr AddOp Expr
data OptionalExpr =
| Empty // I want to make Empty a subtype of OptionalExpr
| Expr // I want to make Expr a subtype of OptionalExpr
In the example above, I first define 2 types: Empty and Expr. Then I make these 2 types the subtype of OptionalExpr. I realize this kind of data type is uncommon. Apparently neither Haskell nor OCaml support it. But I don't know about other functional languages.
I am looking for something that combines ADT with explicit subtyping, not structurally-implied subtyping as in polymorphic variant. There are a few justifications for this requirement:
- First, we want all-or-none subtyping. Say we want A to be a subtype of B, then we will never want to include only some of the variants of A under B. Either A is a subtype of B, in which case B includes all the variants of A, or A is not a subtype of B, in which case B includes none of the variants of A. We don't allow gray-area in between.
- Second, we don't want B to be open in any sense. We have in mind a very specific set of subtypes of B. We don't want something to become an instance of B just by implementing a typeclass or the like.
- Third, say type A has a large number of variants. We want to make type B a supertype of A. Copying all the variants into B, as is required with polymorphic variant, is just too cumbersome and error-prone.
- Fourth, we don't want to introduce new value-constructors when all we want to express is a subtype. In the example above, we could have written OptionalExpr as an ADT with 2 value-constructors, like this:
data OptionalExpr = EmptyExpr | NonEmptyExpr Expr
, or we could have usedMaybe
, but in my application this is unacceptable, because the level of embedding can be quite deep, and it would be a nightmare to deconstruct an deeply embedded value like(L1 (L2 (L3 (L4 (L5 value_wanted)))))
.
To give you some idea why such requirements exist, I show a more specific example below:
PrimaryExpr = ID | LeftParen Expr RightParen
UnaryExpr = PrimaryExpr | NegateOp PrimaryExpr // -
MultExpr = UnaryExpr | MultExpr MultOp UnaryExpr // *
AddExpr = MultExpr | AddExpr AddOp MultExpr // +
CompExpr = AddExpr | AddExpr CompOp AddExpr
Expr = CompExpr
The above example expresses a subtype hierarchy, and expresses ideas such as AddExpr is a CompExpr, but a CompExpr is not an AddExpr. For this specific example, some people have suggested to me that I can replace UnaryExpr, MultExpr, AddExpr and so on with just Expr. That is, I can define all the types as a single type. That loses type constraints such as CompExpr is not AddExpr, and because I'm doing recursive pattern matching on these types, I need that constraints of this hierarchy to be statically enforced.
Is there a name for this kind of data type I'm looking for in the literature? Or am I looking for something that doesn't even make sense? If you think this is the case, why am I looking for something nonsensical? Thanks for any pointers.
EDIT: even though I've written the above code snippets in Haskell's syntax, I am not writing my application in Haskell. I'm using my own language and my own data types, so I am not limited by Haskell's semantics. I am looking for a pointer to similar concepts in literature, so that when I write a report for my project I don't appear to be reinventing something new. I tried all the google keywords I can think of and nothing right was returned so I'm asking here.
回答1:
In a comment, you say:
I'm not sure how to encode a subtype hierarchy using GADTs. If you think it is doable, would you mind providing an answer with an example as to how the type hierarchy given in my example may be encoded?
Therefore I give an answer to this question here. The key idea is to give a type-level function (in the host language, here Haskell) for computing the subtyping relation (of the target language's type system, here your custom EDSL). For simplicity, I will spell out the subtyping relation in full, but standard type-level programming can be used to reduce the repetition and raise the abstraction level as appropriate. First, the extensions needed:
{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
Now the definition of the subtyping relation:
data Level = Primary | Unary | Mult | Add | Comp
type family Subtype a b where
Subtype Primary a = True
Subtype Unary Primary = False
Subtype Unary a = True
Subtype Mult Primary = False
Subtype Mult Unary = False
Subtype Mult a = True
Subtype Add Add = True
Subtype Add Comp = True
Subtype Add a = False
Subtype Comp Comp = True
Subtype Comp a = False
A closed type family is used to guarantee that the subtyping relation cannot be expanded by clients (your second property). Finally, the GADT for target language terms can use the subtyping relation as a constraint on its constructors.
data Expr a where
ID :: Subtype Primary a ~ True => Expr a
Paren :: Subtype Primary a ~ True => Expr b -> Expr a
Negate :: Subtype Unary a ~ True => Expr Unary -> Expr a
Times :: Subtype Add a ~ True => Expr Mult -> Expr Mult -> Expr a
Plus :: Subtype Add a ~ True => Expr Add -> Expr Add -> Expr a
Compose :: Subtype Comp a ~ True => Expr Comp -> Expr Comp -> Expr a
Note that because the argument to Paren
is polymorphic, you will need a type annotation on the contained term to express which "level" of the subtyping hierarchy you want that term to be treated as. I would expect you would need to do this in whatever language you are designing as well. In ghci, we can ask for the type of a sample term:
:t Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add)))
Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add)))
:: (Subtype 'Comp a ~ 'True) => Expr a
This is more or less the type you would expect for this term, I think. You can also see that the expression hierarchy is strictly enforced, though I dare say the error message is not 100% clear (since it is written in host language terms and not target language terms):
:t Negate (Plus ID ID)
<interactive>:1:9:
Couldn't match type ‘'False’ with ‘'True’
Expected type: 'True
Actual type: Subtype 'Add 'Unary
In the first argument of ‘Negate’, namely ‘(Plus ID ID)’
In the expression: Negate (Plus ID ID)
回答2:
Haskell is particularly amenable to modeling your domain, perhaps because it can be described with a fairly simple mathematical model. Crucially, your first point implies that the subtype relation is a well-order. This makes your life very easy - this model would likely translate easily to any language whose type system is at least as strong as that of Haskell.
Start by defining a type (which will be lifted to a kind) to represent your variants:
data Variant = Primary | Unary | Mult | Add | Comp | Expr
Next a non-recursive datatype to represent the nodes in your term language:
data ExprF (k :: Variant -> *) (x :: Variant) where
ID_F :: ExprF k 'Primary
Paren_F :: k 'Expr -> ExprF k 'Primary
Negate_F :: k 'Primary -> ExprF k 'Unary
Mult_F :: k 'Mult -> k 'Unary -> ExprF k 'Mult
Add_F :: k 'Add -> k 'Mult -> ExprF k 'Add
Comp_F :: k 'Add -> k 'Add -> ExprF k 'Comp
Recursive occurrences of terms are represented by an additional parameter. Essentially this is just the typical polynomial functor representation (i.e. Fix) but with an index parameter.
Your expression type is then:
data Expr' (x :: Variant) where
Expr' :: (x <= y) => Expr x -> Expr' y
data Expr (x :: Variant) where
MkExpr :: ExprF Expr' x -> Expr x
The <=
class has not been introduced yet, but it represents your subtype relation.
As mentioned previously, your subtype relation is a well-order, and by this virtue each element in the ordering can be assigned a unique natural number such that the typical ordering on naturals respects your subtype relation. Or in other words, there is an injection f : Variant -> Nat
such that x
is a subtype of y
iff f x <= f y
(or strict subtype iff f x < f y
- such a representation gives you a lot of generality).
The required injection is just given by your grammar. Note that each production is only a "subtype" (i.e. has a right-hand side which should not introduce a constructor) of of productions above it.
data Nat = Z | S Nat
infixr 0 $
type ($) f a = f a
type family VariantIx (x :: Variant) :: Nat where
VariantIx 'Primary = 'Z
VariantIx 'Unary = 'S 'Z
VariantIx 'Mult = 'S $ 'S 'Z
VariantIx 'Add = 'S $ 'S $ 'S 'Z
VariantIx 'Comp = 'S $ 'S $ 'S $ 'S 'Z
VariantIx 'Expr = 'S $ 'S $ 'S $ 'S $ 'S 'Z
You need an implicit subtype relation (which is <=
) but it is often much easier to work with an explicit proof of the relation, so it is typical that the implicit version simply generates the explicit proof. To this end you write two declarations:
data family (:<=:) (x :: k) (y :: k)
class (<=) (x :: k) (y :: k) where
isLTEQ :: x :<=: y
The instances for naturals should fairly obvious:
data instance (:<=:) (x :: Nat) y where
LT_Z :: 'Z :<=: n
LT_S :: n :<=: m -> 'S n :<=: 'S m
instance 'Z <= n where isLTEQ = LT_Z
instance (n <= m) => 'S n <= 'S m where isLTEQ = LT_S isLTEQ
and the instances for Variant
define the order induced by VariantIx
:
newtype instance (:<=:) (x :: Variant) y = IsSubtype (VariantIx x :<=: VariantIx y)
instance (VariantIx x <= VariantIx y) => x <= y where isLTEQ = IsSubtype isLTEQ
You probably want some smart constructors. If you are using a recent GHC you will have access to pattern synonyms, but it isn't necessary:
id_ = MkExpr ID_F
pattern Id = MkExpr ID_F
pattern Paren e = MkExpr (Paren_F (Expr' e))
pattern Neg e = MkExpr (Negate_F (Expr' e))
infixl 6 :+
pattern (:+) a b = MkExpr (Add_F (Expr' a) (Expr' b))
infixl 7 :*
pattern (:*) a b = MkExpr (Mult_F (Expr' a) (Expr' b))
pattern Cmp a b = MkExpr (Comp_F (Expr' a) (Expr' b))
and some simple examples:
>Id :+ Id :+ Neg Id :* Id
Add_F (Add_F ID_F ID_F) (Mult_F (Negate_F ID_F) ID_F)
>Id :+ Id :* Neg (Id :* Id)
<interactive>:6:13:
No instance for (('S $ 'S 'Z) <= 'Z) arising from a use of `Neg'
Note that you could also write your expression type in a slightly different way:
data ExprFlip (x :: Variant) where
MkExprFlip :: (x <= y) => ExprF ExprFlip x -> ExprFlip y
This differs from the original in that the outermost type of an expression has the subtype relation applied to it - so e.g.
pattern Id' = MkExprFlip ID_F
has type ExprFlip t
while Id :: Expr 'Primary
. I can't see any other way in which they differ, and I imagine that it would simply be a matter of preference, or which use cases are most common. The original presentation has the advantage the the output type is always monomorphic, which may make type inference better in some cases, but does not affect the construction of expressions.
To address your four points:
- This model relies on the semantics of the subtype relation by design.
VariantIx
and theVariant
type are closed. Any additional instances for:<=:
or<=
forVariant
orNat
will overlap with the existing ones (which are as general as possible) so while in principle they can defined, attempting to use them will produce type errors.- Essentially you have a reflexive and transitive relation, and these properties are captured in the
<=
instance forNat
once and for all. Changing the subtype relation amounts only to changingVariant
andVariantIx
. - The proofs of the subtype relation are constructed by type inference - by the
<=
class. Since all of the indices in theExprF
datatype are monomorphic, the type checker will always be able to compute the subtype relation for the indices.
Full code:
{-# LANGUAGE StandaloneDeriving, UndecidableInstances, PatternSynonyms
, TypeOperators, KindSignatures, PolyKinds, DataKinds, GADTs, TypeFamilies
, MultiParamTypeClasses, FlexibleContexts, FlexibleInstances #-}
data Variant = Primary | Unary | Mult | Add | Comp | Expr
data ExprF (k :: Variant -> *) (x :: Variant) where
ID_F :: ExprF k 'Primary
Paren_F :: k 'Expr -> ExprF k 'Primary
Negate_F :: k 'Primary -> ExprF k 'Unary
Mult_F :: k 'Mult -> k 'Unary -> ExprF k 'Mult
Add_F :: k 'Add -> k 'Mult -> ExprF k 'Add
Comp_F :: k 'Add -> k 'Add -> ExprF k 'Comp
data Expr' (x :: Variant) where
Expr' :: (x <= y) => Expr x -> Expr' y
data Expr (x :: Variant) where
MkExpr :: ExprF Expr' x -> Expr x
data ExprFlip (x :: Variant) where
MkExprFlip :: (x <= y) => ExprF ExprFlip x -> ExprFlip y
pattern Id' = MkExprFlip ID_F
data Nat = Z | S Nat
infixr 0 $
type ($) f a = f a
type family VariantIx (x :: Variant) :: Nat where
VariantIx 'Primary = 'Z
VariantIx 'Unary = 'S 'Z
VariantIx 'Mult = 'S $ 'S 'Z
VariantIx 'Add = 'S $ 'S $ 'S 'Z
VariantIx 'Comp = 'S $ 'S $ 'S $ 'S 'Z
VariantIx 'Expr = 'S $ 'S $ 'S $ 'S $ 'S 'Z
data family (:<=:) (x :: k) (y :: k)
class (<=) (x :: k) (y :: k) where
isLTEQ :: x :<=: y
data instance (:<=:) (x :: Nat) y where
LT_Z :: 'Z :<=: n
LT_S :: n :<=: m -> 'S n :<=: 'S m
instance 'Z <= n where isLTEQ = LT_Z
instance (n <= m) => 'S n <= 'S m where isLTEQ = LT_S isLTEQ
newtype instance (:<=:) (x :: Variant) y = IsSubtype (VariantIx x :<=: VariantIx y)
instance (VariantIx x <= VariantIx y) => x <= y where isLTEQ = IsSubtype isLTEQ
id_ = MkExpr ID_F
pattern Id = MkExpr ID_F
pattern Paren e = MkExpr (Paren_F (Expr' e))
pattern Neg e = MkExpr (Negate_F (Expr' e))
infixl 6 :+
pattern (:+) a b = MkExpr (Add_F (Expr' a) (Expr' b))
infixl 7 :*
pattern (:*) a b = MkExpr (Mult_F (Expr' a) (Expr' b))
pattern Cmp a b = MkExpr (Comp_F (Expr' a) (Expr' b))
instance Show (Expr' x) where
showsPrec k (Expr' x) = showsPrec k x
instance Show (Expr x) where
showsPrec k (MkExpr x) = showsPrec k x
deriving instance (Show (k 'Mult), Show (k 'Add), Show (k 'Expr), Show (k 'Primary), Show (k 'Unary)) => Show (ExprF k x)
回答3:
There are two things that come to my mind, both for "real" subtyping systems (so not available in Haskell), although I am not completely sure whether any of them fits all of your requirements:
Explicit untagged union types, as in Ceylon, which let you name the type
A | B
, which is a supertype of bothA
andB
. Thus, you could just makeEmpty
andExpr
normal ADTs, and then declare a synonymtype OptionalExpr = Empty | Expr
.The way ADTs are modelled in Scala, as hierarchies of sealed traits and case classes:
sealed trait OptionalExpr case object Empty extends OptionalExpr sealed trait Expr extends OptionalExpr case class IntExpr(i: Int) extends OptionaExpr case class AddExpr(lhs: Expr, op: AddOp, rhs: Expr) extends OptionalExpr
This way,
OptionalExpr
andExpr
are not extensible (since the traits are sealed) and behave mostly like ADTs in Haskell, but you can still access the "intermediate" types like in a normal inheritance hierarchy (unlike in Haskell, where you have only the constructors, which are not by themselves types).
Both cases require a form of pattern matching to access values, of course, since you have to recover in which "part of the union" you are.
回答4:
Unless I misunderstand, polymorphic variants can do pretty much exactly this. However, "untagged union" isn't a great term to use (I imagine most people would think you were asking for C-style unions).
The example would look like this:
type empty = [`Empty]
type bin_op = Add | Sub
type expr = [`Int of int | `Add of expr * bin_op * expr]
type optional_expr = [empty | expr]
type weird_expr = [expr | `Wierd of expr | `Zonk of string]
Note that with OCaml's polymorphic variants the subtype relationship is defined structurally and not between named types.
来源:https://stackoverflow.com/questions/36672346/is-there-a-name-for-adt-with-explicit-subtyping