Haskell lazy I/O and closing files

前端 未结 7 856
攒了一身酷
攒了一身酷 2020-12-07 23:35

I\'ve written a small Haskell program to print the MD5 checksums of all files in the current directory (searched recursively). Basically a Haskell version of md5deep

7条回答
  •  爱一瞬间的悲伤
    2020-12-08 00:01

    unsafeInterleaveIO?

    Yet another solution that comes to mind is to use unsafeInterleaveIO from System.IO.Unsafe. See the reply of Tomasz Zielonka in this thread in Haskell Cafe.

    It defers an input-output operation (opening a file) until it is actually required. Thus it is possible to avoid opening all files at once, and instead read and process them sequentially (open them lazily).

    Now, I believe, mapM getFileLine opens all files but does not start reading from them until putStr . unlines. Thus a lot of thunks with open file handlers float around, this is the problem. (Please correct me if I am wrong).

    An example

    A modified example with unsafeInterleaveIO is running against a 100 GB directory for several minutes now, in constant space.

    getList :: FilePath -> IO [String]
    getList p =
      let getFileLine path =
            liftM (\c -> (show . md5 $ c) ++ " " ++ path)
            (unsafeInterleaveIO $ BS.readFile path)
      in mapM getFileLine =<< getRecursiveContents p 
    

    (I changed for pureMD5 implementation of the hash)

    P.S. I am not sure if this is good style. I believe that solutions with iteretees and strict IO are better, but this one is quicker to make. I use it in small scripts, but I'd be afraid of relying on it in a bigger program.

提交回复
热议问题