Typed abstract syntax and DSL design in Haskell

前端未结

关注

 3  950

I\'m designing a DSL in Haskell and I would like to have an assignment operation. Something like this (the code below is just for explaining my problem in a limited context, I d

相关标签:

3条回答

闹比i

2021-02-06 16:10
You should know that your goals are quite lofty. I don't think you will get very far treating your variables exactly as strings. I'd do something slightly more annoying to use, but more practical. Define a monad for your DSL, which I'll call M:
```
newtype M a = ...

data Exp a where
    ... as before ...

data Var a  -- a typed variable

assign :: Var a -> Exp a -> M ()
declare :: String -> a -> M (Var a)
```
I'm not sure why you have Exp a for assignment and just a for declaration, but I reproduced that here. The String in declare is just for cosmetics, if you need it for code generation or error reporting or something -- the identity of the variable should really not be tied to that name. So it's usually used as
```
myFunc = do
    foobar <- declare "foobar" 42
```
which is the annoying redundant bit. Haskell doesn't really have a good way around this (though depending on what you're doing with your DSL, you may not need the string at all).

As for the implementation, maybe something like
```
data Stmt = forall a. Assign (Var a) (Exp a)
          | forall a. Declare (Var a) a

data Var a = Var String Integer  -- string is auxiliary from before, integer
                                 -- stores real identity.
```
For M, we need a unique supply of names and a list of statements to output.
```
newtype M a = M { runM :: WriterT [Stmt] (StateT Integer Identity a) }
    deriving (Functor, Applicative, Monad)
```
Then the operations as usually fairly trivial.
```
assign v a = M $ tell [Assign v a]

declare name a = M $ do
    ident <- lift get
    lift . put $! ident + 1
    let var = Var name ident
    tell [Declare var a]
    return var
```
I've made a fairly large DSL for code generation in another language using a fairly similar design, and it scales well. I find it a good idea to stay "near the ground", just doing solid modeling without using too many fancy type-level magical features, and accepting minor linguistic annoyances. That way Haskell's main strength -- it's ability to abstract -- can still be used for code in your DSL.

One drawback is that everything needs to be defined within a do block, which can be a hinderance to good organization as the amount of code grows. I'll steal declare to show a way around that:
```
declare :: String -> M a -> M a
```
used like
```
foo = declare "foo" $ do
    -- actual function body
```
then your M can have as a component of its state a cache from names to variables, and the first time you use a declaration with a certain name you render it and put it in a variable (this will require a bit more sophisticated monoid than [Stmt] as the target of your Writer). Later times you just look up the variable. It does have a rather floppy dependence on uniqueness of names, unfortunately; an explicit model of namespaces can help with that but never eliminate it entirely.
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦毁少年i

2021-02-06 16:18
Given that my work focuses on related issues of scope and type safety being encoded at the type-level, I stumbled upon this old-ish question whilst googling around and thought I'd give it a try.

This post provides, I think, an answer quite close to the original specification. The whole thing is surprisingly short once you have the right setup.

First, I'll start with a sample program to give you an idea of what the end result looks like:
```
program :: Program
program = Program
  $  Declare (Var :: Name "foo") (Of :: Type Int)
  :> Assign  (The (Var :: Name "foo")) (EInt 1)
  :> Declare (Var :: Name "bar") (Of :: Type Bool)
  :> increment (The (Var :: Name "foo"))
  :> Assign  (The (Var :: Name "bar")) (ENot $ EBool True)
  :> Done
```
Scoping

In order to ensure that we may only assign values to variables which have been declared before, we need a notion of scope.

GHC.TypeLits provides us with type-level strings (called Symbol) so we can very-well use strings as variable names if we want. And because we want to ensure type safety, each variable declaration comes with a type annotation which we will store together with the variable name. Our type of scopes is therefore: [(Symbol, *)].

We can use a type family to test whether a given Symbol is in scope and return its associated type if that is the case:
```
type family HasSymbol (g :: [(Symbol,*)]) (s :: Symbol) :: Maybe * where
  HasSymbol '[]            s = 'Nothing
  HasSymbol ('(s, a) ': g) s = 'Just a
  HasSymbol ('(t, a) ': g) s = HasSymbol g s
```
From this definition we can define a notion of variable: a variable of type a in scope g is a symbol s such that HasSymbol g s returns 'Just a. This is what the ScopedSymbol data type represents by using an existential quantification to store the s.
```
data ScopedSymbol (g :: [(Symbol,*)]) (a :: *) = forall s.
  (HasSymbol g s ~ 'Just a) => The (Name s)

data Name (s :: Symbol) = Var
```
Here I am purposefully abusing notations all over the place: The is the constructor for the type ScopedSymbol and Name is a Proxy type with a nicer name and constructor. This allows us to write such niceties as:
```
example :: ScopedSymbol ('("foo", Int) ': '("bar", Bool) ': '[]) Bool
example = The (Var :: Name "bar")
```
Statements

Now that we have a notion of scope and of well-typed variables in that scope, we can start considering the effects Statements should have. Given that new variables can be declared in a Statement, we need to find a way to propagate this information in the scope. The key hindsight is to have two indices: an input and an output scope.

To Declare a new variable together with its type will expand the current scope with the pair of the variable name and the corresponding type.

Assignments on the other hand do not modify the scope. They merely associate a ScopedSymbol to an expression of the corresponding type.
```
data Statement (g :: [(Symbol, *)]) (h :: [(Symbol,*)]) where
  Declare :: Name s -> Type a -> Statement g ('(s, a) ': g)
  Assign  :: ScopedSymbol g a -> Exp g a -> Statement g g

data Type (a :: *) = Of
```
Once again we have introduced a proxy type to have a nicer user-level syntax.
```
example' :: Statement '[] ('("foo", Int) ': '[])
example' = Declare (Var :: Name "foo") (Of :: Type Int)

example'' :: Statement ('("foo", Int) ': '[]) ('("foo", Int) ': '[])
example'' = Assign (The (Var :: Name "foo")) (EInt 1)
```
Statements can be chained in a scope-preserving way by defining the following GADT of type-aligned sequences:
```
infixr 5 :>
data Statements (g :: [(Symbol, *)]) (h :: [(Symbol,*)]) where
  Done :: Statements g g
  (:>) :: Statement g h -> Statements h i -> Statements g i
```
Expressions

Expressions are mostly unchanged from your original definition except that they are now scoped and a new constructor EVar lets us dereference a previously-declared variable (using ScopedSymbol) giving us an expression of the appropriate type.
```
data Exp (g :: [(Symbol,*)]) (t :: *) where
  EVar    :: ScopedSymbol g a -> Exp g a
  EBool   :: Bool -> Exp g Bool
  EInt    :: Int  -> Exp g Int
  EAdd    :: Exp g Int -> Exp g Int -> Exp g Int
  ENot    :: Exp g Bool -> Exp g Bool
```
Programs

A Program is quite simply a sequence of statements starting in the empty scope. We use, once more, an existential quantification to hide the scope we end up with.
```
data Program = forall h. Program (Statements '[] h)
```
It is obviously possible to write subroutines in Haskell and use them in your programs. In the example, I have the very simple increment which can be defined like so:
```
increment :: ScopedSymbol g Int -> Statement g g
increment v = Assign v (EAdd (EVar v) (EInt 1))
```
I have uploaded the whole code snippet together with the right LANGUAGE pragmas and the examples listed here in a self-contained gist. I haven't however included any comments there.
0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2021-02-06 16:23
After seeing all the code by @Cactus and the Haskell suggestions by @luqui, I've managed to got a solution close to what I want in Idris. The complete code is available at the following gist:

(https://gist.github.com/rodrigogribeiro/33356c62e36bff54831d)

Some little things I need to fix in the previous solution:
1. I don't know (yet) if Idris support integer literal overloading, what would be quite useful to build my DSL.
2. I've tried to define in DSL syntax a prefix operator for program variables, but it didn't worked as I like. I've got a solution (in the previous gist) that uses a keyword --- use --- for variable access.
I'll check this minor points with guys in Idris #freenode channel to see if these two points are possible.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Typed abstract syntax and DSL design in Haskell

Scoping

Statements

Expressions

Programs