Is there any way to implement hash tables efficiently in a purely functional language? It seems like any change to the hash table would require creating a copy of the origi
The existing answers all have good points to share, and I thought I would just add one more piece of data to the equation: comparing performance of a few different associative data structures.
The test consists of sequentially inserting then looking up and adding the elements of the array. This test isn't incredibly rigorous, and it shouldn't be taken as such, it just an indication of what to expect.
First in Java using HashMap
the unsynchronized Map
implementation:
import java.util.Map;
import java.util.HashMap;
class HashTest {
public static void main (String[] args)
{
Map map = new HashMap ();
int n = Integer.parseInt (args [0]);
for (int i = 0; i < n; i++)
{
map.put (i, i);
}
int sum = 0;
for (int i = 0; i < n; i++)
{
sum += map.get (i);
}
System.out.println ("" + sum);
}
}
Then a Haskell implementation using the recent hashtable work done by Gregory Collins (its in the hashtables
package). This can be both pure (through the ST
monad) or impure through IO
, I'm using the IO
version here:
{-# LANGUAGE ScopedTypeVariables, BangPatterns #-}
module Main where
import Control.Monad
import qualified Data.HashTable.IO as HashTable
import System.Environment
main :: IO ()
main = do
n <- read `fmap` head `fmap` getArgs
ht :: HashTable.BasicHashTable Int Int <- HashTable.new
mapM_ (\v -> HashTable.insert ht v v) [0 .. n - 1]
x <- foldM (\ !s i -> HashTable.lookup ht i >>=
maybe undefined (return . (s +)))
(0 :: Int) [0 .. n - 1]
print x
Lastly, one using the immutable HashMap
implementation from hackage (from the hashmap
package):
module Main where
import Data.List (foldl')
import qualified Data.HashMap as HashMap
import System.Environment
main :: IO ()
main = do
n <- read `fmap` head `fmap` getArgs
let
hashmap =
foldl' (\ht v -> HashMap.insert v v ht)
HashMap.empty [0 :: Int .. n - 1]
let x = foldl' (\ s i -> hashmap HashMap.! i + s) 0 [0 .. n - 1]
print x
Examining the performance for n=10,000,000 , I find the total running time is the following:
Knocking it down to n=1,000,000, we get:
This is interesting for two reasons:
This would seem to indicate that in languages like Haskell and Java which have boxed the map's keys see a big hit from this boxing. Languages that either do not need, or can unbox the keys and values would likely see couple times more performance.
Clearly these implementations are not the fastest, but I would say that using Java as a baseline, they are at least acceptable/usable for many purposes (though perhaps someone more familiar with Java wisdom could say whether HashMap is considered reasonable).
I would note that the Haskell HashMap takes up a lot of space compared to the HashTable.
The Haskell programs were compiled with GHC 7.0.3 and -O2 -threaded
, and run with only the +RTS -s
flag for runtime GC statistics. Java was compiled with OpenJDK 1.7.