Cache Invalidation — Is there a General Solution?

前端 未结 9 1365
情深已故
情深已故 2020-11-30 16:36

\"There are only two hard problems in Computer Science: cache invalidation and naming things.\"

Phil Karlton

Is there a

相关标签:
9条回答
  • 2020-11-30 17:43

    The problem in cache invalidation is that stuff changes without us knowing about it. So, in some cases, a solution is possible if there is some other thing that does know about it and can notify us. In the given example, the getData function could hook into the file system, which does know about all changes to files, regardless of what process changes the file, and this component in turn could notify the component that transforms the data.

    I don't think there is any general magic fix to make the problem go away. But in many practical cases there may very well be opportunities to transform a "polling"-based approach into an "interrupt"-based one, which can make the problem simply go away.

    0 讨论(0)
  • 2020-11-30 17:43

    IMHO, Functional Reactive Programming (FRP) is in a sense a general way to solve cache invalidation.

    Here is why: stale data in FRP terminology is called a glitch. One of FRP's goals is to guarantee absence of glitches.

    FRP is explained in more detail in this 'Essence of FRP' talk and in this SO answer.

    In the talk the Cells represent a cached Object/Entity and a Cell is refreshed if one of it's dependency is refreshed.

    FRP hides the plumbing code associated with the dependency graph and makes sure that there are no stale Cells.


    Another way (different from FRP) that I can think of is wrapping the computed value (of type b) into some kind of a writer Monad Writer (Set (uuid)) b where Set (uuid) (Haskell notation) contains all the identifiers of the mutable values on which the computed value b depends. So, uuid is some kind of a unique identifier that identifies the mutable value/variable (say a row in a database) on which the computed b depends.

    Combine this idea with combinators that operate on this kind of writer Monad and that might lead to some kind of a general cache invalidation solution if you only use these combinators to calculate a new b. Such combinators (say a special version of filter) take Writer monads and (uuid, a)-s as inputs, where a is a mutable data/variable, identified by uuid.

    So every time you change the "original" data (uuid, a) (say the normalized data in a database from which b was computed) on which the computed value of type b depends then you can invalidate the cache that contains b if you mutate any value a on which the computed b value depends, because based on the Set (uuid) in the Writer Monad you can tell when this happens.

    So anytime you mutate something with a given uuid, you broadcast this mutation to all the cache-s and they invalidate the values b that depend on the mutable value identified with said uuid because the Writer monad in which the b is wrapped can tell if that b depends on said uuid or not.

    Of course, this only pays off if you read much more often than you write.


    A third, practical, approach is to use materialized view-s in databases and use them as cache-es. AFAIK they also aim to solve the invalidation problem. This of course limits the operations that connect the mutable data to the derived data.

    0 讨论(0)
  • 2020-11-30 17:43

    cache is hard because you need to consider: 1)the cache is multiple nodes,need consensus for them 2)invalidation time 3)race condition when multple get/set happen

    this is good reading: https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/

    0 讨论(0)
提交回复
热议问题