Statically enforcing that two objects were created from the same (Int) “seed”

问题

In a library I'm working on, I have an API similar to the following:

data Collection a = Collection Seed {-etc...-}
type Seed = Int

newCollection :: Seed -> IO (Collection a)
newCollection = undefined

insert :: a -> Collection a -> IO () -- ...and other mutable set-like functions
insert = undefined

mergeCollections :: Collection a -> Collection a -> IO (Collection a)
mergeCollections (Collection s0 {-etc...-}) (Collection s1 {-etc...-}) 
  | s0 /= s1  = error "This is invalid; how can we make it statically unreachable?"
  | otherwise = undefined

I'd like to be able to enforce that the user cannot call mergeCollections on Collections created with different Seed values.

I thought of trying to tag Collection with a type-level natural: I think this would mean that the Seed would have to be statically known at compile time, but my users might be getting it from an environment variable or user input, so I don't think that would work.

I also hoped I might be able to do something like:

newtype Seed u = Seed Int
newSeed :: Int -> Seed u
newCollection :: Seed u -> IO (Collection u a)
mergeCollections :: Collection u a -> Collection u a -> IO (Collection u a)

Where somehow a Seed is tagged with a unique type in some way, such that the type system could track that both arguments to merge were created from the seed returned by the same invocation of newSeed. To be clear in this (hand-wavy) scheme a and b here would somehow not unify: let a = newSeed 1; b = newSeed 1;.

Is something like this possible?

Examples

Here are some examples of ways I can imagine users creating Seeds and Collections. Users would like to use the other operations (inserting, merging, etc) as freely as they could with any other IO mutable collection:

We only need one seed for all Collections (dynamically) created during the program, but the user must be able to specify in some way how the seed should be determined from the environment at runtime.

One or more static keys gathered from environment vars (or command line args):

main = do
   s1 <- getEnv "SEED1"
   s2 <- getEnv "SEED2"
   -- ... many Collections may be created dynamically from these seeds
   -- and dynamically merged later

回答1:

Probably not in a convenient way. For handling seeds that are known only at runtime, you can use existential types; but then you cannot statically check that two of these existentially wrapped collections match up. The much simpler solution is simply this:

merge :: Collection a -> Collection a -> IO (Maybe (Collection a))

On the other hand, if it is okay to force all operations to be done "together", in a sense, then you can do something like what the ST monad does: group all the operations together, then supply an operation for "running" all the operations that only works if the operations don't leak collections by demanding they be perfectly polymorphic over a phantom variable, hence that the return type doesn't mention the phantom variable. (Tikhon Jelvis also suggests this in his comments.) Here's how that might look:

{-# LANGUAGE Rank2Types #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Collection (Collection, COp, newCollection, merge, inspect, runCOp) where

import Control.Monad.Reader

type Seed = Int
data Collection s a = Collection Seed
newtype COp s a = COp (Seed -> a) deriving (Functor, Applicative, Monad, MonadReader Seed)

newCollection :: COp s (Collection s a)
newCollection = Collection <$> ask

merge :: Collection s a -> Collection s a -> COp s (Collection s a)
merge l r = return (whatever l r) where
  whatever = const

-- just an example; substitute whatever functions you want to have for
-- consuming Collections
inspect :: Collection s a -> COp s Int
inspect (Collection seed) = return seed

runCOp :: (forall s. COp s a) -> Seed -> a
runCOp (COp f) = f

Note in particular that the COp and Collection constructors are not exported. Consequently we need never fear that a Collection will escape its COp; runCOp newCollection is not well-typed (and any other operation that tries to "leak" the collection to the outside world will have the same property). Therefore it will not be possible to pass a Collection constructed with one seed to a merge operating in the context of another seed.

回答2:

I believe this is impossible with the constraint that the seeds come from runtime values, like user input. The typechecker as a tool can only reject invalid programs if we can determine the program is invalid at compiletime. Supposing there is a type such that the typechecker is able to reject programs based on user input, we could deduce that the typechecker is doing some sort of time travel or is able to wholly simulate our deterministic universe. The best you can do as a library author is to smuggle your types into something like ExceptT, which documents the seed precondition and exports awareness for it.

来源：https://stackoverflow.com/questions/38926169/statically-enforcing-that-two-objects-were-created-from-the-same-int-seed

标签

haskell

types

ghc