Filter a list of paths to only include files

徘徊边缘 提交于 2019-12-21 03:47:17

问题


If I have a list of FilePaths, how can I filter them to return only the ones that are regular files (namely, not symlinks or directories)?

For example, using getDirectoryContents

main = do
    contents <- getDirectoryContents "/foo/bar"
    let onlyFiles = filterFunction contents in
        print onlyFiles

where "filterFunction" is a function that returns only the FilePaths that represent files.

The answer may just work on Linux, but cross platform support is preferred.

[EDIT] Just using doesDirectoryExist doesn't work as expected. This script prints a list of everything in the directory, not just files:

module Main where

import System.Directory
import Control.Monad (filterM, liftM)

getFiles :: FilePath -> IO [FilePath]
getFiles root = do
    contents <- getDirectoryContents root
    filesHere <- filterM (liftM not . doesDirectoryExist) contents
    subdirs <- filterM doesDirectoryExist contents
    return filesHere

main = do
    files <- getFiles "/"
    print $ files

Additionally, the variable subdirs will only contain "." and "..".


回答1:


To find standard library functions, Hoogle is a great resource; it's a Haskell search engine that lets you search by type. Using it requires figuring out how to think about types the Haskell Way™, though, which your proposed type signatures doesn't quite work with. So:

  1. You're looking for [Filepath] -> [Filepath]. Remember, the Haskell spelling is FilePath. So…

  2. You're looking for [FilePath] -> [FilePath]. This is unnecessary; if you want to filter things, you should use filter. So…

  3. You're looking for a function of type FilePath -> Bool that you can pass to filter. But this can't quite be right: this function needs to query the file system, which is an effect, and Haskell tracks effects in the type system using IO. So…

  4. You're looking for a function of type FilePath -> IO Bool.

And if we search for that on Hoogle, the first result is doesFileExist :: FilePath -> IO Bool from System.Directory. From the docs:

The operation doesFileExist returns True if the argument file exists and is not a directory, and False otherwise.

So System.Directory.doesFileExist is exactly what you want. (Well… only with a little extra work! See below.)

Now, how do you use it? You can't use filter here, because you have an effectful function. You could use Hoogle again – if filter has the type (a -> Bool) -> [a] -> [a], then annotating the results of the functions with a monad m gives you the new type Monad m => (a -> m Bool) -> [a] -> m [Bool] – but there's an easier "cheap trick". In general, if func is a function with an effectful/monadic version, that effectful/monadic version is called funcM, and it often lives in Control.Monad.¹ And indeed, there is a function Control.Monad.filterM :: Monad m => (a -> m Bool) -> [a] -> m [a].

However! Much as we hate to admit it, even in Haskell, types don't provide all the information you need. Importantly, we're going to have a problem here:

  • File paths given as arguments to functions are interpreted relative to the current directory, but…
  • getDirectoryContents returns paths relative to its argument.

Thus, there are two approaches we can take to fix things. The first is to adjust the results of getDirectoryContents so that they can be interpreted correctly. (We also discarding the . and .. results, although if you're just looking for regular files they won't hurt anything.) This will return file names which include the directory whose contents are being examined. The adjust getDirectoryContents function looks like this:

getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
    map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp

The filter gets rid of the special directories, and the map prepends the argument directory to all the results. This makes the returned files acceptable arguments to doesFileExist. (If you haven't seen them before, (System.FilePath.</>) appends two file paths; and (Control.Applicative.<$>), also available as (Data.Functor.<$>), is an infix synonym for fmap, which is like liftM but more broadly applicable.)

Putting that all together, your final code becomes:

import Control.Applicative
import Control.Monad
import System.FilePath
import System.Directory

getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
    map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp

main :: IO ()
main = do
  contents  <- getQualifiedDirectoryContents "/foo/bar"
  onlyFiles <- filterM doesFileExist contents
  print onlyFiles

Or, if you feel like being fancy/point-free:

import Control.Applicative
import Control.Monad
import System.FilePath
import System.Directory

getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
    map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp

main :: IO ()
main =   print
     =<< filterM doesFileExist
     =<< getQualifiedDirectoryContents "/foo/bar"

The second approach is to adjust things so that doesFileExist runs with the appropriate current directory. This will return just the file name relative to the directory whose contents are being examined. To do this, we want to use the withCurrentDirectory :: FilePath -> IO a -> IO a function (but see below), and then pass getDirectoryContents the current directory "." argument. The documentation for withCurrentDirectory says (in part):

Run an IO action with the given working directory and restore the original working directory afterwards, even if the given action fails due to an exception.

Putting all this together gives us the following code

import Control.Monad
import System.Directory

main :: IO ()
main = withCurrentDirectory "/foo/bar" $
         print =<< filterM doesFileExist =<< getDirectoryContents "."

This is what we want, but unfortunately, it's only available in version 1.3.2.0 of the directory package – as of this writing, the most recent one, and not the one I have. Luckily, it's an easy function to implement; such set-a-value-locally functions are usually implemented in terms of Control.Exception.bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c. The bracket function is run as bracket before after action, and it correctly handles exceptions. So we can define withCurrentDirectory ourselves:

withCurrentDirectory :: FilePath -> IO a -> IO a
withCurrentDirectory fp m =
  bracket getCurrentDirectory setCurrentDirectory $ \_ -> do
    setCurrentDirectory fp
    m

And then use this to get the final code:

import Control.Exception
import Control.Monad
import System.Directory

withCurrentDirectory :: FilePath -> IO a -> IO a
withCurrentDirectory fp m =
  bracket getCurrentDirectory setCurrentDirectory $ \_ -> do
    setCurrentDirectory fp
    m

main :: IO ()
main = withCurrentDirectory "/foo/bar" $
         print =<< filterM doesFileExist =<< getDirectoryContents "."

Also, one quick note about lets in dos: in a do block,

do ...foo...
   let x = ...bar...
   ...baz...

is equivalent to

do ...foo...
   let x = ...bar... in
     do ...baz...

So your example code doesn't need the in in the let and can outdent the print call.


¹ Not always: sometimes you want different classes of effects! Use Applicative from Control.Applicative when possible; more things are Applicatives than are Monads (although this means you can do less with them). In that case, the effectful functions may live there, or also in Data.Foldable or Data.Traversable.




回答2:


For Unix systems, the package unix exposes these API's:

  • isRegularFile

  • isSymbolicLink

  • isDirectory

You can use the combination of them to achieve what you want. A sample demo of using them in GHCI:

λ> import System.Posix.Files
λ> status <- getFileStatus "/home/sibi"
λ> isDirectory status
True
λ> isRegularFile status
False



回答3:


You can use a library shelly. It's dedicated to do a shell scripting with Haskell. Here is the solution with shelly:

module Sh where

import Control.Monad
import Data.String 

import Shelly

dir = fromString "/home/me"

printAll = mapM_ print

main = do
    files <- shelly $ filterM test_f =<< ls dir
    printAll files

We use the functions:

ls - for listing the directory content.

ls :: FilePath -> Sh [FilePath]

test_f - for testing if the directory is a file:

test_f :: FilePath -> Sh Bool

shelly - to execute the script:

shelly :: MonadIO m => Sh a -> m a

also we use fromString to create a shelly's FilePath. There is a dedicated type it's not just a string.




回答4:


I happened to need a way to list only regular files in a directory, and this is how I do it. I thought it might be helpful:

import System.Directory

listFilesInDirectory :: FilePath -> IO [FilePath]
listFilesInDirectory dir = do
    rawList <- listDirectory dir
    filterM doesFileExist (map (dir </>) rawList)


来源:https://stackoverflow.com/questions/31419429/filter-a-list-of-paths-to-only-include-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!