Haskell lazy I/O and closing files

前端 未结 7 876
攒了一身酷
攒了一身酷 2020-12-07 23:35

I\'ve written a small Haskell program to print the MD5 checksums of all files in the current directory (searched recursively). Basically a Haskell version of md5deep

7条回答
  •  被撕碎了的回忆
    2020-12-07 23:48

    NOTE: I've edited my code slightly to reflect the advice in Duncan Coutts's answer. Even after this edit his answer is obviously much better than mine, and doesn't seem to run out of memory in the same way.


    Here's my quick attempt at an Iteratee-based version. When I run it on a directory with about 2,000 small (30-80K) files it's about 30 times faster than your version here and seems to use a bit less memory.

    For some reason it still seems to run out of memory on very large files—I don't really understand Iteratee well enough yet to be able to tell why easily.

    module Main where
    
    import Control.Monad.State
    import Data.Digest.Pure.MD5
    import Data.List (sort)
    import Data.Word (Word8) 
    import System.Directory 
    import System.FilePath (())
    import qualified Data.ByteString.Lazy as BS
    
    import qualified Data.Iteratee as I
    import qualified Data.Iteratee.WrappedByteString as IW
    
    evalIteratee path = evalStateT (I.fileDriver iteratee path) md5InitialContext
    
    iteratee :: I.IterateeG IW.WrappedByteString Word8 (StateT MD5Context IO) MD5Digest
    iteratee = I.IterateeG chunk
      where
        chunk s@(I.EOF Nothing) =
          get >>= \ctx -> return $ I.Done (md5Finalize ctx) s
        chunk (I.Chunk c) = do
          modify $ \ctx -> md5Update ctx $ BS.fromChunks $ (:[]) $ IW.unWrap c
          return $ I.Cont (I.IterateeG chunk) Nothing
    
    fileLine :: FilePath -> MD5Digest -> String
    fileLine path c = show c ++ " " ++ path
    
    main = mapM_ (\path -> putStrLn . fileLine path =<< evalIteratee path) 
       =<< getRecursiveContents "."
    
    getRecursiveContents :: FilePath -> IO [FilePath]
    getRecursiveContents topdir = do
      names <- getDirectoryContents topdir
    
      let properNames = filter (`notElem` [".", ".."]) names
    
      paths <- concatForM properNames $ \name -> do
        let path = topdir  name
    
        isDirectory <- doesDirectoryExist path
        if isDirectory
          then getRecursiveContents path
          else do
            isFile <- doesFileExist path
            if isFile
              then return [path]
              else return []
    
      return (sort paths)
    
    concatForM :: (Monad m) => [a1] -> (a1 -> m [a]) -> m [a]
    concatForM xs f = liftM concat (forM xs f)
    

    Note that you'll need the iteratee package and TomMD's pureMD5. (And my apologies if I've done something horrifying here—I'm a beginner with this stuff.)

提交回复
热议问题