How to implement search in file system in haskell?

。_饼干妹妹 提交于 2020-01-04 23:45:47

问题


I'm not exactly new to haskell, but haven't used it much in real world.

So what I want to do is to find all git repositories starting from some folders. Basically I'm trying to do this find . -type d -exec test -e '{}/.git' ';' -print -prune only faster via using haskell concurrency features.

This is what I got so far.

import Control.Concurrent.Async
import System.Directory (doesDirectoryExist)
import System.FilePath ((</>))
import System.IO (FilePath)


isGitRepo :: FilePath -> IO Bool
isGitRepo p = doesDirectoryExist $ p </> ".git"


main :: IO ()
main = putStrLn "hello"

I've found this lib which has this function mapConcurrently :: Traversable t => (a -> IO b) -> t a -> IO (t b) Which got me thinking that what I need is to produce lazy Tree data structure that would reflect folders structure. Then filter it concurrently with isGitRepo and that fold it into list and print it. Well, of course I know how to make data FTree = Node String [FTree] or something like that, but I have questions. How to produce it concurrently? How to produce absolute path while traversing the tree? Questions like that and so on.


回答1:


Which got me thinking that what I need is to produce lazy Tree data structure that would reflect folders structure.

I'm not sure you need a tree structure for this. You could make an intermediate such structure, but you could just as well manage without one. The key thing is you need to have O(1) appending (to combine your results). A difference list (like dlist) does this.

How to produce it concurrently?

You already got that: using mapConcurrently!

How to produce absolute path while traversing the tree?

listDirectory lets you get the next possible segments in the path. You can get the next paths by appending each of these segments to the existing path (they won't be absolute paths unless the existing path was though).


Here is a working function:

import System.Directory (doesDirectoryExist, listDirectory)
import System.FilePath ((</>), combine)
import System.IO (FilePath)
import Control.Concurrent.Async (mapConcurrently)
import qualified Data.DList as DL

-- | tries to find all git repos in the subtree rooted at the path
findGitRepos :: FilePath -> IO (DL.DList FilePath)
findGitRepos p = do
  isNotDir <- not <$> doesDirectoryExist p
  if isNotDir
    then pure DL.empty             -- the path 'p' isn't a directory
    else do
      isGitDir <- doesDirectoryExist (p </> ".git")
      if isGitDir
        then pure (DL.singleton p) -- the folder is a git repo
        else do                    -- recurse to subfolders
          subdirs <- listDirectory p
          repos <- mapConcurrently findGitRepos (combine p `map` subdirs)
          pure (DL.concat repos)


来源:https://stackoverflow.com/questions/41404647/how-to-implement-search-in-file-system-in-haskell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!