问题
I writed a Haskell module to list all the contents of a directory by breadth-first order. The below is the source code.
module DirElements (dirElem) where
import System.Directory (getDirectoryContents, doesDirectoryExist)
import System.FilePath ((</>))
dirElem :: FilePath -> IO [[FilePath]]
dirElem dirPath = iterateM (not.null) (concatMapM getDirectoryContents') [dirPath] >>= return.tail
getDirectoryContents' :: FilePath -> IO [FilePath]
getDirectoryContents' dirPath = do
isDir <- do doesDirectoryExist dirPath
if isDir then dirContent else return [] where
dirContent = do
contents <- getDirectoryContents dirPath
return.(map (dirPath</>)).tail.tail $ contents
iterateM :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m [a]
iterateM fb f x = do --Notice: Due to the the implementation of >>=, iterateM can't be writen like iterate which gives a infinite list and have type of iterateM :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m [a]
if fb x
then do
tail <- do {fx <- f x; iterateM fb f fx}
return (x:tail)
else return []
concatMapM :: Monad m => (a -> m[b]) -> [a] -> m[b]
concatMapM f list = mapM f list >>= return.concat
It works correct but when performing on a large directory, it will "suspend" for a little while, and spring out all the results.
After a research I find it is the same question with sequence $ map return [1..]::[[Int]]
see Why the Haskell sequence function can't be lazy or why recursive monadic functions can't be lazy
回答1:
I modified the older answer that Davorak linked to to use the new pipes
library.
It uses StateP
to keep a queue of untraversed directories so that it can do a breadth first traversal. It uses MaybeP
for exiting from the loop, as a convenience.
import Control.Monad
import Control.Proxy
import Control.Proxy.Trans.Maybe
import Control.Proxy.Trans.State as S
import Data.Sequence hiding (filter)
import System.FilePath.Posix
import System.Directory
getUsefulContents :: FilePath -> IO [FilePath]
getUsefulContents path
= fmap (filter (`notElem` [".", ".."])) $ getDirectoryContents path
traverseTree
:: (Proxy p)
=> FilePath
-> () -> Producer (MaybeP (StateP (Seq FilePath) p)) FilePath IO r
traverseTree path () = do
liftP $ S.modify (|> path)
forever $ do
x <- liftP $ S.gets viewl
case x of
EmptyL -> mzero
file :< s -> do
liftP $ S.put s
respond file
p <- lift $ doesDirectoryExist file
when p $ do
names <- lift $ getUsefulContents file
let namesfull = map (file </>) names
liftP $ forM_ namesfull $ \name ->
S.modify (|> name)
This defines a breadth-first lazy producer of files. If you hook it up to a printing stage, it will print out the files as it traverses the tree:
main = runProxy $ evalStateK empty $ runMaybeK $
traverseTree "/tmp" >-> putStrLnD
Laziness means that if you only demand 3 files, it will only traverse the tree as much as necessary to generate three files, then it will stop:
main = runProxy $ evalStateK empty $ runMaybeK $
traverseTree "/tmp" >-> takeB_ 3 >-> putStrLnD
If you want to learn more about the pipes library, then I recommend you read the tutorial.
回答2:
This comes up every once in a while and the answer ends up being use an iteratee like library. Most often suggested recently has been the Proxy library.
- Streaming recursive descent of a directory in Haskell
- Older pipes solution out of date and non-iteratee like solution breadth-first traversal of directory tree is not lazy
I have seen Conduit solutions before and a few elegant monadic solutions, but I am not finding them now.
回答3:
First of all, that's not related to strictness. Like many monads, IO is actually nonstrict in its monadic operations. This is related to lazy vs. eager I/O.
The problem is that you first do the directory traversal and then you process the result. You can improve that by using coroutines to interleave them. One simple way is to make the directory traversal take a callback as argument:
getDirectoryContents' :: (MonadIO m) => (FilePath -> m a) -> FilePath -> m ()
getDirectoryContents' k fp = {- ... -}
This is the simplest and least flexible solution. A more flexible solution is to actually implement coroutines. You can either roll your own coroutine monad by using free, monad-coroutine or operational, or you can use one of the many streaming abstractions like conduit, enumerator or pipes with the last one being my personal recommentation for simple cases like this one.
回答4:
Everyone is telling you to use iteratees or pipes or the like, which are the current popular approach. But there's another, classic way to do this! Just use unsafeInterleaveIO
from System.IO.Unsafe
. All this function of type IO a -> IO a
does is modify an IO action so that it only actually performs the IO when the value thunk is demanded, which is exactly what you were asking for. You can use this to write an iterateM
with your desired semantics trivially.
Examples like this are where unsafeInterleaveIO
shines.
You have, however, I'm sure, noted the "unsafe" in the name -- there are other examples, where you want direct control over filehandles and resource usage or the like, where unsafeInterleaveIO
will indeed be bad news, and potentially even introduce violations of referential transparency.
(see this answer for more discussion: When is unsafeInterleaveIO unsafe?)
But again, in a case like this, I think unsafeInterleaveIO
is the obvious, correct, and straightforward result.
来源:https://stackoverflow.com/questions/14474545/listing-all-the-contents-of-a-directory-by-breadth-first-order-results-in-low-ef