How do I make this algorithm lazier without repeating myself?

问题

(Inspired by my answer to this question.)

Consider this code (it's supposed to find the largest element that's less than or equal to a given input):

data TreeMap v = Leaf | Node Integer v (TreeMap v) (TreeMap v) deriving (Show, Read, Eq, Ord)

closestLess :: Integer -> TreeMap v -> Maybe (Integer, v)
closestLess i = precise Nothing where
  precise :: Maybe (Integer, v) -> TreeMap v -> Maybe (Integer, v)
  precise closestSoFar Leaf = closestSoFar
  precise closestSoFar (Node k v l r) = case i `compare` k of
    LT -> precise closestSoFar l
    EQ -> Just (k, v)
    GT -> precise (Just (k, v)) r

This isn't very lazy. Once the GT case is entered, we know for sure that the final return value will be Just something rather than Nothing, but the Just still isn't available until the end. I'd like to make this lazier so that the Just is available as soon as the GT case is entered. My test case for this is that I want Data.Maybe.isJust $ closestLess 5 (Node 3 () Leaf undefined) to evaluate to True rather than bottoming. Here's one way I can think to do this:

data TreeMap v = Leaf | Node Integer v (TreeMap v) (TreeMap v) deriving (Show, Read, Eq, Ord)

closestLess :: Integer -> TreeMap v -> Maybe (Integer, v)
closestLess _ Leaf = Nothing
closestLess i (Node k v l r) = case i `compare` k of
  LT -> closestLess i l
  EQ -> Just (k, v)
  GT -> Just (precise (k, v) r)
  where
    precise :: (Integer, v) -> TreeMap v -> (Integer, v)
    precise closestSoFar Leaf = closestSoFar
    precise closestSoFar (Node k v l r) = case i `compare` k of
      LT -> precise closestSoFar l
      EQ -> (k, v)
      GT -> precise (k, v) r

However, I'm now repeating myself: the core logic is now in both closestLess and in precise. How can I write this so that it's lazy but without repeating myself?

回答1:

Rather than use explicit wrappers, you can leverage the type system. Note that the version of precise that uses Maybe for your first code snippet:

precise :: Maybe (Integer, v) -> TreeMap v -> Maybe (Integer, v)
precise closestSoFar Leaf = closestSoFar
precise closestSoFar (Node k v l r) = case i `compare` k of
  LT -> precise closestSoFar l
  EQ -> Just (k, v)
  GT -> precise (Just (k, v)) r

is almost exactly the same algorithm as the version of precise without Maybe from your second code snippet, which could be written in the Identity functor as:

precise :: Identity (Integer, v) -> TreeMap v -> Identity (Integer, v)
precise closestSoFar Leaf = closestSoFar
precise closestSoFar (Node k v l r) = case i `compare` k of
  LT -> precise closestSoFar l
  EQ -> Identity (k, v)
  GT -> precise (Identity (k, v)) r

These can be unified into a version polymorphic in the Applicative:

precise :: (Applicative f) => f (Integer, v) -> TreeMap v -> f (Integer, v)
precise closestSoFar Leaf = closestSoFar
precise closestSoFar (Node k v l r) = case i `compare` k of
  LT -> precise closestSoFar l
  EQ -> pure (k, v)
  GT -> precise (pure (k, v)) r

By itself, that doesn't accomplish much, but if we know that the GT branch will always return a value, we can force it to run in the Identity functor, regardless of the starting functor. That is, we can start in the Maybe functor but recurse into the Identity functor in the GT branch:

closestLess :: Integer -> TreeMap v -> Maybe (Integer, v)
closestLess i = precise Nothing
  where
    precise :: (Applicative t) => t (Integer, v) -> TreeMap v -> t (Integer, v)
    precise closestSoFar Leaf = closestSoFar
    precise closestSoFar (Node k v l r) = case i `compare` k of
      LT -> precise closestSoFar l
      EQ -> pure (k, v)
      GT -> pure . runIdentity $ precise (Identity (k, v)) r

This works fine with your test case:

> isJust $ closestLess 5 (Node 3 () Leaf undefined)
True

and is a nice example of polymorphic recursion.

Another nice thing about this approach from a performance point of view is that the -ddump-simpl shows that there are no wrappers or dictionaries. It's all been erased at the type level with specialized functions for the two functors:

closestLess
  = \ @ v i eta ->
      letrec {
        $sprecise
        $sprecise
          = \ @ v1 closestSoFar ds ->
              case ds of {
                Leaf -> closestSoFar;
                Node k v2 l r ->
                  case compareInteger i k of {
                    LT -> $sprecise closestSoFar l;
                    EQ -> (k, v2) `cast` <Co:5>;
                    GT -> $sprecise ((k, v2) `cast` <Co:5>) r
                  }
              }; } in
      letrec {
        $sprecise1
        $sprecise1
          = \ @ v1 closestSoFar ds ->
              case ds of {
                Leaf -> closestSoFar;
                Node k v2 l r ->
                  case compareInteger i k of {
                    LT -> $sprecise1 closestSoFar l;
                    EQ -> Just (k, v2);
                    GT -> Just (($sprecise ((k, v2) `cast` <Co:5>) r) `cast` <Co:4>)
                  }
              }; } in
      $sprecise1 Nothing eta

回答2:

Starting from my non-lazy implementation, I first refactored precise to receive Just as an argument, and generalized its type accordingly:

data TreeMap v = Leaf | Node Integer v (TreeMap v) (TreeMap v) deriving (Show, Read, Eq, Ord)

closestLess :: Integer -> TreeMap v -> Maybe (Integer, v)
closestLess i = precise Just Nothing where
  precise :: ((Integer, v) -> t) -> t -> TreeMap v -> t
  precise _ closestSoFar Leaf = closestSoFar
  precise wrap closestSoFar (Node k v l r) = case i `compare` k of
    LT -> precise wrap closestSoFar l
    EQ -> wrap (k, v)
    GT -> precise wrap (wrap (k, v)) r

Then, I changed it to do wrap early and call itself with id in the GT case:

data TreeMap v = Leaf | Node Integer v (TreeMap v) (TreeMap v) deriving (Show, Read, Eq, Ord)

closestLess :: Integer -> TreeMap v -> Maybe (Integer, v)
closestLess i = precise Just Nothing where
  precise :: ((Integer, v) -> t) -> t -> TreeMap v -> t
  precise _ closestSoFar Leaf = closestSoFar
  precise wrap closestSoFar (Node k v l r) = case i `compare` k of
    LT -> precise wrap closestSoFar l
    EQ -> wrap (k, v)
    GT -> wrap (precise id (k, v) r)

This still works exactly as before, except for the benefit of the added laziness.

回答3:

I think the CPS version you answered with yourself is the best but for completeness here are a few more ideas. (EDIT: Buhr's answer is now the most performant.)

The first idea is to get rid of the "closestSoFar" accumulator, and instead let the GT case handle all the logic of picking the rightmost value smallest than the argument. In this form, the GT case can directly return a Just:

closestLess1 :: Integer -> TreeMap v -> Maybe (Integer, v)
closestLess1 _ Leaf = Nothing
closestLess1 i (Node k v l r) =
  case i `compare` k of
    LT -> closestLess1 i l
    EQ -> Just (k, v)
    GT -> Just (fromMaybe (k, v) (closestLess1 i r))

This is simpler, but takes a bit more space on the stack when you hit a lot of GT cases. Technically you could even use that fromMaybe in the accumulator form (i.e., replacing the fromJust implicit in luqui's answer), but that would be a redundant, unreachable branch.

The other idea that there's really two "phases" of the algorithm, one before and one after you hit a GT, so you parameterize it by a boolean to represent these two phases, and use dependent types to encode the invariant that there will always be a result in the second phase.

data SBool (b :: Bool) where
  STrue :: SBool 'True
  SFalse :: SBool 'False

type family MaybeUnless (b :: Bool) a where
  MaybeUnless 'True a = a
  MaybeUnless 'False a = Maybe a

ret :: SBool b -> a -> MaybeUnless b a
ret SFalse = Just
ret STrue = id

closestLess2 :: Integer -> TreeMap v -> Maybe (Integer, v)
closestLess2 i = precise SFalse Nothing where
  precise :: SBool b -> MaybeUnless b (Integer, v) -> TreeMap v -> MaybeUnless b (Integer, v)
  precise _ closestSoFar Leaf = closestSoFar
  precise b closestSoFar (Node k v l r) = case i `compare` k of
    LT -> precise b closestSoFar l
    EQ -> ret b (k, v)
    GT -> ret b (precise STrue (k, v) r)

回答4:

How about

GT -> let Just v = precise (Just (k,v) r) in Just v

回答5:

Not only do we always know Just, after its first discovery, we also always know Nothing until then. That's actually two different "logics".

So, we go left first of all, so make that explicit:

data TreeMap v = Leaf | Node Integer v (TreeMap v) (TreeMap v) 
                 deriving (Show, Read, Eq, Ord)

closestLess :: Integer 
            -> TreeMap v 
            -> Maybe (Integer, v)
closestLess i = goLeft 
  where
  goLeft :: TreeMap v -> Maybe (Integer, v)
  goLeft n@(Node k v l _) = case i `compare` k of
          LT -> goLeft l
          _  -> Just (precise (k, v) n)
  goLeft Leaf = Nothing

  -- no more maybe if we're here
  precise :: (Integer, v) -> TreeMap v -> (Integer, v)
  precise closestSoFar Leaf           = closestSoFar
  precise closestSoFar (Node k v l r) = case i `compare` k of
        LT -> precise closestSoFar l
        EQ -> (k, v)
        GT -> precise (k, v) r

The price is we repeat at most one step at most once.

来源：https://stackoverflow.com/questions/59337499/how-do-i-make-this-algorithm-lazier-without-repeating-myself

标签

haskell

lazy-evaluation

dry