Data.STM.LinkedList implementation

问题

I'm looking at Data.STM.LinkedList implementation for a high performance linked list. Looking at the documentation, the length function run in O(n) - why is that ? Was there any real issue to implement it in O(1) ?

Here is the source code https://hackage.haskell.org/package/stm-linkedlist-0.1.0.0/docs/src/Data-STM-LinkedList-Internal.html#length

Is it possible implement it in O(1) ? i'm new to Haskell so I'm not sure if holding some metadata about the list is problematic.

Thanks!

回答1:

To a first approximation, Haskell is a sufficiently expressive language that any algorithm implemented in another general purpose language can also be implemented in Haskell while preserving the asymptotic performance characteristics. (This is a pretty low bar. Most general-purpose languages are this expressive.)

In particular, though Haskell most naturally supports immutable data structures, it has sufficient support for mutable data that mutable data structures and their algorithms can usually be fairly directly translated into Haskell code. There may be some overhead (often substantial overhead), and mutable data structures may be significantly more awkward to use than their immutable cousins, but it's still possible.

As a practical matter, though, matching the actual (as opposed to asymptotic) performance of a C++ implementation of a mutable data structure is likely to prove extremely difficult if not impossible. It may be reasonable to get within 2-3 times the performance of C++, and getting within 5-10 times is pretty easy (see below). However, if you need to match C++ performance, you would be probably better off writing the high performance mutating code in C++ and using the FFI (foreign function interface) to interface to that code.

Anyway, a "moderate performance" doubly linked-list with O(1) length is certainly possible, and there's no fundamental difficulty with maintaining mutable list-wide metadata. The reason that stm-linkedlist does not provide an O(1) length is probably the same reason that C++ guaranteed only O(n) std::list<>::size performance before C++11. Namely, many practical uses of doubly-linked lists don't ever need to call length/size, and providing O(1) performance comes with an additional bookkeeping cost.

As a proof of concept, the following data types are sufficient to implement a fully mutable doubly-linked list with an O(1) length function. Here, types and identifiers ending in underscores are for internal use only. The list is strict in its pointers (so no infinite lists!) but lazy in its values.

data List a = List
  { headNode_ :: !(IORef (Node_ a))
  , length_ :: !(IORef Int) }
data Node_ a = Node_
  { prev_ :: !(IORef (Node_ a))
  , next_ :: !(IORef (Node_ a))
  , value_ :: a }

The List type contains a pointer (i.e., IORef) to an incomplete headNode that points to the start and end of the list (or to itself for an empty list) but has an undefined value field. That makes this an unsafe node value, so it should never be directly accessible to the end-user. The List also contains a pointer to the list length value.

An additional type Node (no underscore) is used to decorate a node pointer with its corresponding list (like the "iterator" from the comments), to make the list metadata available to functions that need it:

data Node a = Node
  { node_ :: !(IORef (Node_ a))
  , list_ :: !(List a) }

Note that List and Node are the user-facing data types for working with lists.

You create an empty list like so:

empty :: IO (List a)
empty = mdo
  n <- newIORef (Node_ n n undefined)
  List n <$> newIORef 0

Insertion before and after a given node works as follows. Here's where the unsafe head node representation pays off, since the algorithm can treat insertion at the beginning and end of the list as special cases of insertion between the head node and an actual list node.

insertBefore :: a -> Node a -> IO (Node a)
insertBefore x Node{node_=rnode2, list_} = do
  Node_{prev_=rnode1} <- readIORef rnode2
  insertBetween_ x list_ rnode1 rnode2

insertAfter :: a -> Node a -> IO (Node a)
insertAfter x Node{node_=rnode1, list_} = do
  Node_{next_=rnode2} <- readIORef rnode1
  insertBetween_ x list_ rnode1 rnode2

insertBetween_ :: a -> List a -> IORef (Node_ a) -> IORef (Node_ a) -> IO (Node a)
insertBetween_ x l rnode1 rnode2 = do
  modifyIORef' (length_ l) succ
  newnode <- newIORef (Node_ rnode1 rnode2 x)
  modifyIORef' rnode1 (\n -> n{next_=newnode})
  modifyIORef' rnode2 (\n -> n{prev_=newnode})
  return $ Node newnode l

Since a user isn't allowed to "have" a head node, we need additional user-facing functions to insert at the beginning and end of a list:

prepend :: a -> List a -> IO (Node a)
prepend x l = insertAfter x (Node (headNode_ l) l)

append :: a -> List a -> IO (Node a)
append x l = insertBefore x (Node (headNode_ l) l)

Observe that all insertions go through insertBetween_ which is responsible for increasing the length value.

Deletion is straightforward and uniform whether it's an internal node or one at the start or end. All deletions go through this delete function which is responsible for decreasing the length value.

delete :: Node a -> IO ()
delete Node{node_,list_} = do
  modifyIORef' (length_ list_) pred
  Node_{next_, prev_} <- readIORef node_
  modifyIORef' prev_ (\n -> n{next_=next_})
  modifyIORef' next_ (\n -> n{prev_=prev_})

Deletion of the head node would be a disaster, but users aren't allowed to have such a Node, so we're safe.

If a user has a Node, she can move back and forth through the list:

prev :: Node a -> IO (Maybe (Node a))
prev Node{node_, list_} = do
  Node_{prev_} <- readIORef node_
  return $ maybeNode_ prev_ list_

next :: Node a -> IO (Maybe (Node a))
next Node{node_, list_} = do
  Node_{next_} <- readIORef node_
  return $ maybeNode_ next_ list_

maybeNode_ :: IORef (Node_ a) -> List a -> Maybe (Node a)
maybeNode_ n l =
  if n == headNode_ l
  then Nothing
  else Just (Node n l)

Note that we must take care never to give the user the head node, so maybeNode_ here checks for it and returns Nothing instead.

To get started, the user can get the start or end of a List using the following functions (which use prev or next on the forbidden head node):

start :: List a -> IO (Maybe (Node a))
start l = next $ Node (headNode_ l) l

end :: List a -> IO (Maybe (Node a))
end l = prev $ Node (headNode_ l) l

All that's missing are a few miscellaneous query functions:

value :: Node a -> IO a
value = fmap value_ . readIORef . node_

null :: List a -> IO Bool
null l = (==0) <$> length l

length :: List a -> IO Int
length = readIORef . length_

some utilities to convert to plain lists:

toList :: List a -> IO [a]
toList = toList_ next_

toListRev :: List a -> IO [a]
toListRev = toList_ prev_

toList_ :: (Node_ a -> IORef (Node_ a)) -> List a -> IO [a]
toList_ dir l = go =<< readIORef h
  where h = headNode_ l
        go n = do
          if dir n == h then return []
            else do
            n' <- readIORef (dir n)
            (value_ n':) <$> go n'

and a Show instance for debugging:

instance (Show a) => Show (List a) where
  showsPrec d lst = showParen (d > 10) $ showString "fromList " . showsPrec 11 (unsafePerformIO $ toList lst)

WARNING: This Show instance is unsafe if the list is mutated before the generated string is fully evaluated, so it should only be used for debugging (and probably removed from a production version).

Also, while it's not strictly necessary since we can delete and re-insert, no self-respecting mutable structure would be complete without in-place modification of elements:

modify :: (a -> a) -> Node a -> IO ()
modify f Node{node_} = modifyIORef' node_ (\n -> n { value_ = f (value_ n) })

Here's the full code. (See the definition ex1 for example usage.) You're welcome to use it as a starting point for your own implementation. It's untested and unbenchmarked, except that a couple of quick tests show that it's probably about 5-10x slower than a C++ implementation.

{-# LANGUAGE NamedFieldPuns, RecursiveDo #-}

module LinkedList
  ( List, Node
  , value, null, length
  , empty, prepend, append, insertBefore, insertAfter, delete, modify
  , prev, next, start, end
  , toList, toListRev
  ) where

import System.IO.Unsafe
import Control.Monad
import Prelude hiding (null, length)

import Data.IORef

data List a = List
  { headNode_ :: !(IORef (Node_ a))
  , length_ :: !(IORef Int) }
data Node a = Node
  { node_ :: !(IORef (Node_ a))
  , list_ :: !(List a) }
data Node_ a = Node_
  { prev_ :: !(IORef (Node_ a))
  , next_ :: !(IORef (Node_ a))
  , value_ :: a }

-- unsafe show instance: remove from production version
instance (Show a) => Show (List a) where
  showsPrec d lst = showParen (d > 10) $ showString "fromList " . showsPrec 11 (unsafePerformIO $ toList lst)

value :: Node a -> IO a
value = fmap value_ . readIORef . node_

null :: List a -> IO Bool
null l = (==0) <$> length l

length :: List a -> IO Int
length = readIORef . length_

empty :: IO (List a)
empty = mdo
  n <- newIORef (Node_ n n undefined)
  List n <$> newIORef 0

prepend :: a -> List a -> IO (Node a)
prepend x l = insertAfter x (Node (headNode_ l) l)

append :: a -> List a -> IO (Node a)
append x l = insertBefore x (Node (headNode_ l) l)

insertBefore :: a -> Node a -> IO (Node a)
insertBefore x Node{node_=rnode2, list_} = do
  Node_{prev_=rnode1} <- readIORef rnode2
  insertBetween_ x list_ rnode1 rnode2

insertAfter :: a -> Node a -> IO (Node a)
insertAfter x Node{node_=rnode1, list_} = do
  Node_{next_=rnode2} <- readIORef rnode1
  insertBetween_ x list_ rnode1 rnode2

insertBetween_ :: a -> List a -> IORef (Node_ a) -> IORef (Node_ a) -> IO (Node a)
insertBetween_ x l rnode1 rnode2 = do
  modifyIORef' (length_ l) succ
  newnode <- newIORef (Node_ rnode1 rnode2 x)
  modifyIORef' rnode1 (\n -> n{next_=newnode})
  modifyIORef' rnode2 (\n -> n{prev_=newnode})
  return $ Node newnode l

delete :: Node a -> IO ()
delete Node{node_,list_} = do
  modifyIORef' (length_ list_) pred
  Node_{next_, prev_} <- readIORef node_
  modifyIORef' prev_ (\n -> n{next_=next_})
  modifyIORef' next_ (\n -> n{prev_=prev_})

modify :: (a -> a) -> Node a -> IO ()
modify f Node{node_} = modifyIORef' node_ (\n -> n { value_ = f (value_ n) })

prev :: Node a -> IO (Maybe (Node a))
prev Node{node_, list_} = do
  Node_{prev_} <- readIORef node_
  return $ maybeNode_ prev_ list_

next :: Node a -> IO (Maybe (Node a))
next Node{node_, list_} = do
  Node_{next_} <- readIORef node_
  return $ maybeNode_ next_ list_

maybeNode_ :: IORef (Node_ a) -> List a -> Maybe (Node a)
maybeNode_ n l =
  if n == headNode_ l
  then Nothing
  else Just (Node n l)

start :: List a -> IO (Maybe (Node a))
start l = next $ Node (headNode_ l) l

end :: List a -> IO (Maybe (Node a))
end l = prev $ Node (headNode_ l) l

toList :: List a -> IO [a]
toList = toList_ next_

toListRev :: List a -> IO [a]
toListRev = toList_ prev_

toList_ :: (Node_ a -> IORef (Node_ a)) -> List a -> IO [a]
toList_ dir l = go =<< readIORef h
  where h = headNode_ l
        go n = do
          if dir n == h then return []
            else do
            n' <- readIORef (dir n)
            (value_ n':) <$> go n'

ex1 :: IO (List Int)
ex1 = do
  t <- empty
  mapM_ (flip prepend t) [10,9..1]
  mapM_ (flip append t) [11..20]
  return t

来源：https://stackoverflow.com/questions/61550527/data-stm-linkedlist-implementation

标签

haskell

doubly-linked-list