Reentrant caching of “referentially transparent” IO calls

Assume we have an IO action such as

lookupStuff :: InputType -> IO OutputType

which could be something simple such as DNS lookup, or some web-service call against a time-invariant data.

Let's assume that:

The operation never throws any exception and/or never diverges
If it wasn't for the IO monad, the function would be pure, i.e. the result is always the same for equal input parameters
The action is reentrant, i.e. it can be called from multiple threads at the same time safely.
The lookupStuff operation is quite (time-)expensive.

The problem I'm facing is how to properly (and w/o using any unsafe*IO* cheat) implement a reentrant cache, that can be called from multiple threads, and coalesces multiple queries for the same input-parameters into a single request.

I guess I'm after something similiar as GHC's blackhole-concept for pure computations but in the IO "calculation" context.

What is the idiomatic Haskell/GHC solution for the stated problem?

Yeah, basically reimplement the logic. Although it seems similar to what GHC is already doing, that's GHC's choice. Haskell can be implemented on VMs that work very differently, so in that sense it isn't already done for you.

But yeah, just use an MVar (Map InputType OutputType) or even an IORef (Map InputType OutputType) (make sure to modify with atomicModifyIORef), and just store the cache in there. If this imperative solution seems wrong, it's the "if not for the IO, this function would be pure" constraint. If it were just an arbitrary IO action, then the idea that you would have to keep state in order to know what to execute or not seems perfectly natural. The problem is that Haskell does not have a type for "pure IO" (which, if it depends on a database, it is just behaving pure under certain conditions, which is not the same as being a hereditarily pure).

import qualified Data.Map as Map
import Control.Concurrent.MVar

-- takes an IO function and returns a cached version
cache :: (Ord a) => (a -> IO b) -> IO (a -> IO b)
cache f = do
    r <- newMVar Map.empty
    return $ \x -> do
        cacheMap <- takeMVar r
        case Map.lookup x cacheMap of
            Just y -> do 
                putMVar r cacheMap
                return y
            Nothing -> do
                y <- f x
                putMVar (Map.insert x y cacheMap)
                return y

Yeah it's ugly on the inside. But on the outside, look at that! It's just like the type of a pure memoization function, except for it has IO stained all over it.

Here's some code implementing more or less what I was after in my original question:

import           Control.Concurrent
import           Control.Exception
import           Data.Either
import           Data.Map           (Map)
import qualified Data.Map           as Map
import           Prelude            hiding (catch)

-- |Memoizing wrapper for 'IO' actions
memoizeIO :: Ord a => (a -> IO b) -> IO (a -> IO b)
memoizeIO action = do
  cache <- newMVar Map.empty
  return $ memolup cache action

  where
    -- Lookup helper
    memolup :: Ord a => MVar (Map a (Async b)) -> (a -> IO b) -> a -> IO b
    memolup cache action' args = wait' =<< modifyMVar cache lup
      where
        lup tab = case Map.lookup args tab of
          Just ares' ->
            return (tab, ares')
          Nothing    -> do
            ares' <- async $ action' args
            return (Map.insert args ares' tab, ares')

The code above builds upon Simon Marlow's Async abstraction as described in Tutorial: Parallel and Concurrent Programming in Haskell:

-- |Opaque type representing asynchronous results.
data Async a = Async ThreadId (MVar (Either SomeException a))

-- |Construct 'Async' result. Can be waited on with 'wait'.
async :: IO a -> IO (Async a)
async io = do
  var <- newEmptyMVar
  tid <- forkIO ((do r <- io; putMVar var (Right r))
                 `catch` \e -> putMVar var (Left e))
  return $ Async tid var

-- |Extract value from asynchronous result. May block if result is not
-- available yet. Exceptions are returned as 'Left' values.
wait :: Async a -> IO (Either SomeException a)
wait (Async _ m) = readMVar m

-- |Version of 'wait' that raises exception.
wait' :: Async a -> IO a
wait' a = either throw return =<< wait a

-- |Cancels asynchronous computation if not yet completed (non-blocking).
cancel :: Async a -> IO ()
cancel (Async t _) = throwTo t ThreadKilled

来源：https://stackoverflow.com/questions/5484847/reentrant-caching-of-referentially-transparent-io-calls

标签

haskell

concurrency

memoization