Updating record when referenced by multiple data structures

后端 未结 7 1285
独厮守ぢ
独厮守ぢ 2020-12-10 06:50

Suppose I have a record, e.g. Person, and I want to be able to look this person up through multiple data structures. Maybe there\'s an index by name, another in

7条回答
  •  忘掉有多难
    2020-12-10 07:27

    The "update all the index structures" approach doesn't have to be needless ceremony, if you model your concept of a "collection of people with efficient lookup operations" as a unitary thing in itself, rather than a bunch of independent collections that you're "manually" trying to keep in sync with each other.

    Say you've got a Person type. Then you have a collection of Person objects that you want to be indexed by the types Name and Zip. You could use things like Map Name Person and Map Zip Person, but that doesn't really express your meaning. You don't have two groups of people, one keyed by Name and the other keyed by Zip. You have one group of people, which can by looked up by either Name or Zip, so the code you write and data structures you use should reflect that.

    Lets call the collection type People. For your index lookup you'll end up with something like findByName :: People -> Name -> Person and findByZip :: People -> Zip -> Person.

    You've also got functions of type Person -> Person that can "update" Person records. So you can use findByName to pull out a Person from a People, then apply an update function to get a new Person. Now what? You'll have to construct a new People with the original Person replaced with a new Person. The "update" functions can't handle this, since they're only concerned with Person values, and don't know anything about your People store (there could even be many People stores). So you'll need a function like updatePeople :: Person -> Person -> People -> People, and you'll end up writing a lot of code like this:

    let p = findByName name people
        p' = update p
    in updatePeople p p' people
    

    That's a bit boilerplatey. Looks like a job for updateByName :: Name -> (Person -> Person) -> People -> People.

    With that, where in an OO language you might write something like people.findByName(name).changeSomething(args) you can now write updateByName name (changeSomething args) people. Not so different!

    Note that I haven't talked at all about how any of these data structures or operations are actually implemented. I'm thinking purely about the concepts you have and the operations that make sense on them. That means a scheme like this will work regardless of how you're implementing them; you even can (probably should?) hide the implementation details behind a module barrier. You may well implement People as a record of multiple collections mapping different things to your Person records, but you from the "outside" you can just think of it it a single collection that supports multiple different types of lookup/update operations, and don't have to worry about keeping multiple indexes in sync. It's only within the implementation of the People type and its operations that you have to worry about that, which gives you a place to solve it once and well, rather than having to do it correctly on every operation.

    You can take this sort of thing further. With some extra assumptions (such as the knowledge that your Name, Zip, and any other indexes are all implemented with the same pattern just on different fields of Person/People) you can probably use type classes and/or template Haskell to avoid having to implement findByName, findByZip, findByFavouriteSpoon etc separately (although having separate implementations gives you more opportunity to use different indexing strategies depending on the types involved, and may help with optimizing the updates so that e.g. you only have to update the indexes that could possibly be invalidated). You can use type classes and type families to implement a findBy that uses the type of whatever index key it is invoked on to determine which index to use, whether you have separate implementations or a single generic one (although this means that you can't have multiple indexes with the same type).

    Here's an example I knocked up when I should've been working, providing type-class-based findBy and updateBy operations:

    {-# LANGUAGE FlexibleContexts, MultiParamTypeClasses, TypeFamilies #-}
    
    import Data.Map (Map, (!), adjust, delete, insert)
    
    
    -- sample data declarations
    newtype Name = Name String
        deriving (Eq, Ord, Show)
    
    newtype Zip = Zip Int
        deriving (Eq, Ord, Show)
    
    data Person = Person
      { name    :: Name
      , zipCode :: Zip
      }
    
    -- you probably wouldn't export the constructor here
    data People = People
      { byName :: Map Name Person
      , byZip  :: Map Zip Person
      }
    
    
    -- class for stores that can be indexed by key
    class FindBy key store where
        type Result key store
        findBy :: key -> store -> Result key store
        updateBy :: key -> (Result key store -> Result key store) -> store -> store
    
    
    -- helper functions
    -- this stuff would be hidden
    updateIndex
        :: Ord a
        => (Person -> a) -> Person -> Person -> Map a Person -> Map a Person
    updateIndex f p p' = insert (f p') p' . delete (f p)
    
    -- this function has some per-index stuff;
    -- note that if you add a new index to People you get a compile error here
    -- telling you to account for it
    -- also note that we put the *same* person in every map; sharing should mean
    -- that we're not duplicating the objects, so no wasted memory
    replacePerson :: Person -> Person -> People -> People
    replacePerson p p' ps = ps { byName = byName', byZip = byZip' }
      where
        byName' = updateIndex name    p p' $ byName ps
        byZip'  = updateIndex zipCode p p' $ byZip  ps
    
    -- a "default" definition for updateBy in terms of findBy when the store happens
    -- to be People and the result happens to be Person
    updatePeopleBy
        :: (FindBy key People, Result key People ~ Person)
        => key -> (Person -> Person) -> People -> People
    updatePeopleBy k f ps =
        let p = findBy k ps
        in replacePerson p (f p) ps
    
    
    -- this is basically the "declaration" of all the indexes that can be used
    -- externally
    instance FindBy Name People where
        type Result Name People = Person
        findBy n ps = byName ps ! n
        updateBy = updatePeopleBy
    
    instance FindBy Zip People where
        type Result Zip People = Person
        findBy z ps = byZip ps ! z
        updateBy = updatePeopleBy
    

提交回复
热议问题