I\'ve written a small Haskell program to print the MD5 checksums of all files in the current directory (searched recursively). Basically a Haskell version of md5deep>
NOTE: I've edited my code slightly to reflect the advice in Duncan Coutts's answer. Even after this edit his answer is obviously much better than mine, and doesn't seem to run out of memory in the same way.
Here's my quick attempt at an Iteratee-based version. When I run it on a directory with about 2,000 small (30-80K) files it's about 30 times faster than your version here and seems to use a bit less memory.
For some reason it still seems to run out of memory on very large files—I don't really understand Iteratee well enough yet to be able to tell why easily.
module Main where
import Control.Monad.State
import Data.Digest.Pure.MD5
import Data.List (sort)
import Data.Word (Word8)
import System.Directory
import System.FilePath ((>))
import qualified Data.ByteString.Lazy as BS
import qualified Data.Iteratee as I
import qualified Data.Iteratee.WrappedByteString as IW
evalIteratee path = evalStateT (I.fileDriver iteratee path) md5InitialContext
iteratee :: I.IterateeG IW.WrappedByteString Word8 (StateT MD5Context IO) MD5Digest
iteratee = I.IterateeG chunk
where
chunk s@(I.EOF Nothing) =
get >>= \ctx -> return $ I.Done (md5Finalize ctx) s
chunk (I.Chunk c) = do
modify $ \ctx -> md5Update ctx $ BS.fromChunks $ (:[]) $ IW.unWrap c
return $ I.Cont (I.IterateeG chunk) Nothing
fileLine :: FilePath -> MD5Digest -> String
fileLine path c = show c ++ " " ++ path
main = mapM_ (\path -> putStrLn . fileLine path =<< evalIteratee path)
=<< getRecursiveContents "."
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topdir = do
names <- getDirectoryContents topdir
let properNames = filter (`notElem` [".", ".."]) names
paths <- concatForM properNames $ \name -> do
let path = topdir > name
isDirectory <- doesDirectoryExist path
if isDirectory
then getRecursiveContents path
else do
isFile <- doesFileExist path
if isFile
then return [path]
else return []
return (sort paths)
concatForM :: (Monad m) => [a1] -> (a1 -> m [a]) -> m [a]
concatForM xs f = liftM concat (forM xs f)
Note that you'll need the iteratee package and TomMD's pureMD5. (And my apologies if I've done something horrifying here—I'm a beginner with this stuff.)