I used to ask a similar question once. Now I\'ll be more specific. The purpose is to learn a Haskell idiom to write iterative algorithms with monadic results. In particular,
Some things to consider:
For raw all-out performance, write a custom State monad, like so:
import System.Random.Mersenne.Pure64
data R a = R !a {-# UNPACK #-}!PureMT
-- | The RMonad is just a specific instance of the State monad where the
--   state is just the PureMT PRNG state.
--
-- * Specialized to a known state type
--
newtype RMonad a = S { runState :: PureMT -> R a }
instance Monad RMonad where
    {-# INLINE return #-}
    return a = S $ \s -> R a s
    {-# INLINE (>>=) #-}
    m >>= k  = S $ \s -> case runState m s of
                                R a s' -> runState (k a) s'
    {-# INLINE (>>) #-}
    m >>  k  = S $ \s -> case runState m s of
                                R _ s' -> runState k s'
-- | Run function for the Rmonad.
runRmonad :: RMonad a -> PureMT -> R a
runRmonad (S m) s = m s
evalRmonad :: RMonad a -> PureMT -> a
evalRmonad r s = case runRmonad r s of R x _ -> x
-- An example of random iteration step: one-dimensional random walk.
randStep :: (Num a) => a -> RMonad a
randStep x = S $ \s -> case randomInt s of
                    (n, s') | n < 0     -> R (x+1) s'
                            | otherwise -> R (x-1) s'
Like so: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=27414#a27414
Which runs in constant space (modulo the [Double] you build up), and is some 8x faster than your original.
The use of a specialized state monad with local defintion outperforms the Control.Monad.Strict significantly as well.
Here's what the heap looks like, with the same paramters as you:

Note that it is about 10x faster, and uses 1/5th the space. The big red thing is your list of doubles being allocated.
Inspired by your question, I captured the PureMT pattern in a new package: monad-mersenne-random, and now your program becomes this:
The other change I made was to worker/wrapper transform iterateM, enabling it to be inlined:
 {-# INLINE iterateM #-}
 iterateM n f x = go n x
     where
         go 0 !x = return x
         go n !x = f x >>= go (n-1)
Overall, this brings your code from, with K=500, N=30k
So that is, 220x faster.
The heap is a bit better too, now that iterateM unboxes.
