How to handle data structure migration in a compositional way in Haskell?

问题

I am trying to implement (de)serialization of data structures in Haskell in a way that:

Takes care of evolving schema of the data structure,
Allows safe reading of past versions if provided some "patching" code exists,
Does not need to keep old versions of data type definition around (as is the case with safecopy

I have implemented this mechanism in the past in a way that I find unsatisfying: There is much repetition in the code as it needs to traverse whole structure even if only a leaf changes, and the code to handle various versions is not composable.

What I am looking for is something that would work like a stream of patches in a VCS: At each version change, I need only to write the code to handle the specific change (e.g. some field is transformed from Text to Int, there is a new field, some field is deleted...) and given some serialized chunk of bytes at a known version, the load function applies all patches to retrieve a valid data structure.

I have tried to write some code along those lines but cannot get something that's composable in the way I am looking for (and I did not even tackled the issue of sum types...). Here is my attempt:

data Versioned (v :: Nat) a where
  (:$:) ::             (a -> b) -> Versioned v a -> Versioned v b
  (:*:) :: Versioned v (a -> b) -> Versioned v a -> Versioned v b
  Atom  :: Get a                                 -> Versioned v a
  Cast  :: Versioned v' a                        -> Versioned v a

The idea was to reify an applicative structure in such a way it becomes possible to apply minimal changes.

This seems only doable using some form of Generic deserialization mechanism: Deserialize the bytes to a generic form then apply a chain of transformator to reach a shape that satisfies current.

Any hint towards a solution would be most helpful.

2017-02-13

My problem can be split in two sub-problems:

How to ensure statically there exist deserialization functions for each version, up to some (statically) known version?
How to handle migration of data structure in a safe and minimally invasive way?

Problem 1. yields the following (non compiling code):

  -- | A class instantiating a serializer/deserializer for some version
  class Versionable (v :: Nat) a where
    reader :: Proxy v -> Get a
    writer :: Proxy v -> a -> Put

  -- | Current version is a "global" constraint
  type family CurrentVersion :: Nat

  class VersionUpTo (v :: Nat) a

  instance (Versionable 1 a) => VersionUpTo 1 a
  instance (Versionable v  a, VersionUpTo (v - 1) a) => VersionUpTo v a

  load :: (VersionUpTo CurrentVersion a) => ByteString -> Either String [a]
  load = runGet loadGetter
    where
      loadGetter = sequence $ repeat $ do
        v <- getInt32be
        case v of
          1 -> reader (Proxy :: Proxy 1)
          2 -> reader (Proxy :: Proxy 2)
          3 -> reader (Proxy :: Proxy 3)

The problem is, of course, the value of v to dispatch on depends on CurrentVersion a, which raises the following issue:

How to write a generic load function that will read the version from underlying byte stream and dispatch to the correct reader function, without resorting to explicitly enumerate all cases?

Even if CurrentVersion is not statically at call site of load, it is not known at definition site hence it is not possible to enumerate all valid cases. It seems the only option would be to somehow generate the cases using TH...

Problem 2. is orthogonal to 1. Here the problem is that a data structure typed T evolves over time but we need to take care of old representations: We should be able to deserialize any version v of T up to the CurrentVersion. This is easily done by defining a Versionable n T for each target version, but this introduces a lot of redundancy given the changes between versions n and n+1 are usually limited to one part of the structure.

I think that the metaphor of a stream of patches does not work because it actually goes backward: The starting point is the current data structure and we need to adapt the past representations to the current version. Here are 3 versions of the same object:

 instance Versionable 1 Obj3 where
   reader _ = doGet $ Obj3 :$: (fromInt :$: getint) :*: (fromText :$: Atom get)

 instance Versionable 2 Obj3 where
   reader _ = doGet $ Obj3 :$: Atom get :*: (fromText :$: Atom get)

 instance Versionable 3 Obj3 where
    reader _          = doGet $ Obj3 :$: Atom get :*: getf2 F2
    writer _ Obj3{..} = put f31 >> put f32

There is some regularity as we see each past version is an adaptation of the current version.

Hence the idea of representing reader as a reified Applicative (or maybe Monadic) functor to which can be applied surgical updates to cope with older versions. But then I am stuck with how to select some node deep in the tree of current deserialisers to apply some change in a typesafe way...

2017-02-13

Point 2. above seems to lead to nothing but convoluted code, involving a lot of type-level wizardry for a minor benefit. Consider the aforementioned 3 versions of Obj3, ideally I would like to find a way to write:

 geto3 = Obj3 :$: Atom get :*: getf2 F2

 instance Versionable 2 Obj3 where
   reader _ = doGet $ _replaceAt (0,1) (fromText :$: Atom get) get03

 instance Versionable 3 Obj3 where
    reader _          = doGet $ get03
    writer _ Obj3{..} = put f31 >> put f32

where _replaceAt :: (Int, Int) -> Versioned a -> Versioned b -> Versioned b means that we want to replace some subtree at index (x,y) in the deserializer for b, whose type is Versioned a, with the second argument. It seems doable to express that in type-safe way but this requires exposing the structure of Obj3 as a type T in Versioned T.

来源：https://stackoverflow.com/questions/42179682/how-to-handle-data-structure-migration-in-a-compositional-way-in-haskell

标签

haskell

serialization

types