How does one implement hash tables in a functional language?

后端 未结 3 1718
清歌不尽
清歌不尽 2020-12-28 14:37

Is there any way to implement hash tables efficiently in a purely functional language? It seems like any change to the hash table would require creating a copy of the origi

3条回答
  •  粉色の甜心
    2020-12-28 15:35

    The existing answers all have good points to share, and I thought I would just add one more piece of data to the equation: comparing performance of a few different associative data structures.

    The test consists of sequentially inserting then looking up and adding the elements of the array. This test isn't incredibly rigorous, and it shouldn't be taken as such, it just an indication of what to expect.

    First in Java using HashMap the unsynchronized Map implementation:

    import java.util.Map;
    import java.util.HashMap;
    
    class HashTest {
        public static void main (String[] args)
        {
            Map  map = new HashMap ();
            int n = Integer.parseInt (args [0]);
            for (int i = 0; i < n; i++)
                {
                    map.put (i, i);
                }
    
            int sum = 0;
            for (int i = 0; i < n; i++)
                {
                    sum += map.get (i);
                }
    
    
            System.out.println ("" + sum);
        }
    }
    

    Then a Haskell implementation using the recent hashtable work done by Gregory Collins (its in the hashtables package). This can be both pure (through the ST monad) or impure through IO, I'm using the IO version here:

    {-# LANGUAGE ScopedTypeVariables, BangPatterns #-}
    module Main where
    
    import Control.Monad
    import qualified Data.HashTable.IO as HashTable
    import System.Environment
    
    main :: IO ()
    main = do
      n <- read `fmap` head `fmap` getArgs
      ht :: HashTable.BasicHashTable Int Int <- HashTable.new
      mapM_ (\v -> HashTable.insert ht v v) [0 .. n - 1]
      x <- foldM (\ !s i -> HashTable.lookup ht i >>=
                   maybe undefined (return . (s +)))
           (0 :: Int) [0 .. n - 1]
      print x
    

    Lastly, one using the immutable HashMap implementation from hackage (from the hashmap package):

    module Main where
    
    import Data.List (foldl')
    import qualified Data.HashMap as HashMap
    import System.Environment
    
    main :: IO ()
    main = do
      n <- read `fmap` head `fmap` getArgs
      let
        hashmap = 
            foldl' (\ht v -> HashMap.insert v v ht) 
               HashMap.empty [0 :: Int .. n - 1]
      let x = foldl' (\ s i -> hashmap HashMap.! i + s) 0 [0 .. n - 1]
      print x
    

    Examining the performance for n=10,000,000 , I find the total running time is the following:

    • Java HashMap -- 24.387s
    • Haskell HashTable -- 7.705s, 41% time in GC (
    • Haskell HashMap -- 9.368s, 62% time in GC

    Knocking it down to n=1,000,000, we get:

    • Java HashMap -- 0.700s
    • Haskell HashTable -- 0.723s
    • Haskell HashMap -- 0.789s

    This is interesting for two reasons:

    1. The performance is generally pretty close (except where Java diverges above 1M entries)
    2. A huge amount of time is spent in collection! (killing Java in the case of n=10,0000,000).

    This would seem to indicate that in languages like Haskell and Java which have boxed the map's keys see a big hit from this boxing. Languages that either do not need, or can unbox the keys and values would likely see couple times more performance.

    Clearly these implementations are not the fastest, but I would say that using Java as a baseline, they are at least acceptable/usable for many purposes (though perhaps someone more familiar with Java wisdom could say whether HashMap is considered reasonable).

    I would note that the Haskell HashMap takes up a lot of space compared to the HashTable.

    The Haskell programs were compiled with GHC 7.0.3 and -O2 -threaded, and run with only the +RTS -s flag for runtime GC statistics. Java was compiled with OpenJDK 1.7.

提交回复
热议问题