scala iteratee to recursively process files and subdirectories

夙愿已清 提交于 2019-12-08 07:31:21

问题


I want to apply a function for every file in a directory and subdirectories, as follows:

  def applyRecursively(dir: String, fn: (File) => Any) {
    def listAndProcess(dir: File) {
      dir.listFiles match {
        case null => out.println("exception: dir cannot be listed: " + dir.getPath); List[File]()
        case files => files.toList.sortBy(_.getName).foreach(file => {
          fn(file)
          if (!java.nio.file.Files.isSymbolicLink(file.toPath) && file.isDirectory) listAndProcess(file)
        })
      }
    }
    listAndProcess(new File(dir))
  }

  def exampleFn(file: File) { println(s"processing $file") } 

  applyRecursively(dir, exampleFn)

this works. the question here is how I could refactor this code by using scala Iteratees. something like this:

val en = Enumerator.generateM(...) // ???
val it: Iteratee[File, Unit] = Iteratee.foreach(exampleFn)
val res = en.run(it)
res.onSuccess { case x => println("DONE") }

回答1:


It does not capture all your requirements but this can get you started

object ExampleEnumerator {
  import scala.concurrent.ExecutionContext.Implicits.global

  def exampleFn(file: File) { println(s"processing $file") }

  def listFiles(dir: File): Enumerator[File] = {
    val files = Option(dir.listFiles).toList.flatten.sortBy(_.getName)

    Enumerator(dir) andThen Enumerator(files :_*).flatMap(listFiles)
  }

  def main(args: Array[String]) {
    import scala.concurrent.duration._

    val dir = "."
    val en: Enumerator[File] = listFiles(new File(dir))
    val it: Iteratee[File, Unit] = Iteratee.foreach(exampleFn)
    val res = en.run(it)
    res.onSuccess { case x => println("DONE") }

    Await.result(res, 10.seconds)
  }
}



回答2:


You can use Enumerator.unfold for this. The signature is:

def unfold[S, E](s: S)(f: (S) => Option[(S, E)])(implicit ec: ExecutionContext): Enumerator[E]

The idea is that you start with a value of type S, and then apply a function to it that returns an Option[(S, E)]. A value of None means the Enumerator has reached EOF. A Some contains another S to unfold, and the next value the Enumerator[E] will generate. In your example you can start with a Array[File] (the initial directory), take the first value from the Array, and check if it's a file or directory. If it's just a file, you return the tail of the Array with the File tupled together. If the File is a directory, you get the file listing and add it to the beginning of the Array. Then next steps in unfold will continue to process the contained files.

You end up with something like this:

def list(dir: File)(implicit ec: ExecutionContext): Enumerator[File] = {
  Enumerator.unfold(Array(dir)) { listing =>
    listing.headOption.map { file =>
      if(!java.nio.file.Files.isSymbolicLink(file.toPath) && file.isDirectory)
        (file.listFiles.sortBy(f => (f.isDirectory, f.getName)) ++ listing.tail) -> file
      else
        listing.tail -> file
    }
  }
}

I added an extra sort by isDirectory to prioritize non-directories first. This means that if directory contents are added to the Array to unfold, the files will be consumed first before adding more contents. This will prevent the memory footprint from quickly expanding due to the recursive nature.

If you want the directories to be removed from the final Enumerator, you can use Enumeratee.filter to do that. You'll end up with something like:

list(dir) &> Enumeratee.filter(!_.isDirectory) |>> Iteratee.foreach(fn)



回答3:


This just complements the great answer of m-w with some logging to help understand it.

$ cd /david/test
$ find .
.
./file1
./file2
./file3d
./file3d/file1
./file3d/file2
./file4

java:

import play.api.libs.iteratee._
import java.io.File
import scala.concurrent.Await
import scala.concurrent.duration.Duration

object ExampleEnumerator3 {
  import scala.concurrent.ExecutionContext.Implicits.global

  def exampleFn(file: File) { println(s"processing $file") }

  def list(dir: File): Enumerator[File] = {
    println(s"list $dir")
    val initialInput: List[File] = List(dir)
    Enumerator.unfold(initialInput) { (input: List[File]) =>
      val next: Option[(List[File], File)] = input.headOption.map { file =>
        if(file.isDirectory) {
          (file.listFiles.toList.sortBy(_.getName) ++ input.tail) -> file
        } else {
          input.tail -> file
        }
      }
      next match {
        case Some(dn) => print(s"value to unfold: $input\n  next value to unfold: ${dn._1}\n  next input: ${dn._2}\n")
        case None => print(s"value to unfold: $input\n  finished unfold\n")
      }
      next
    }
  }

  def main(args: Array[String]) {
    val dir = new File("/david/test")
    val res = list(dir).run(Iteratee.foreach(exampleFn))
    Await.result(res, Duration.Inf)
  }
}

log:

list /david/test
value to unfold: List(/david/test)
  next value to unfold: List(/david/test/file1, /david/test/file2, /david/test/file3d, /david/test/file4)
  next input: /david/test
processing /david/test
value to unfold: List(/david/test/file1, /david/test/file2, /david/test/file3d, /david/test/file4)
  next value to unfold: List(/david/test/file2, /david/test/file3d, /david/test/file4)
  next input: /david/test/file1
processing /david/test/file1
value to unfold: List(/david/test/file2, /david/test/file3d, /david/test/file4)
  next value to unfold: List(/david/test/file3d, /david/test/file4)
  next input: /david/test/file2
processing /david/test/file2
value to unfold: List(/david/test/file3d, /david/test/file4)
  next value to unfold: List(/david/test/file3d/file1, /david/test/file3d/file2, /david/test/file4)
  next input: /david/test/file3d
processing /david/test/file3d
value to unfold: List(/david/test/file3d/file1, /david/test/file3d/file2, /david/test/file4)
  next value to unfold: List(/david/test/file3d/file2, /david/test/file4)
  next input: /david/test/file3d/file1
processing /david/test/file3d/file1
value to unfold: List(/david/test/file3d/file2, /david/test/file4)
  next value to unfold: List(/david/test/file4)
  next input: /david/test/file3d/file2
processing /david/test/file3d/file2
value to unfold: List(/david/test/file4)
  next value to unfold: List()
  next input: /david/test/file4
processing /david/test/file4
value to unfold: List()
  finished unfold



回答4:


This just complements the great answer of @JonasAnso with some logging to help understand it.

$ cd /david/test
$ find .
.
./file1
./file2
./file3d
./file3d/file1
./file3d/file2
./file4

java:

import play.api.libs.iteratee._
import java.io.File
import scala.concurrent.Await
import scala.concurrent.duration.Duration

object ExampleEnumerator2b {
  import scala.concurrent.ExecutionContext.Implicits.global

  def exampleFn(file: File) { println(s"processing $file") }

  def listFiles(dir: File): Enumerator[File] = {
    println(s"listFiles. START: $dir")

    if (dir.isDirectory) {
      val files = dir.listFiles.toList.sortBy(_.getName)
      Enumerator(dir) andThen Enumerator(files :_*).flatMap(listFiles)
    } else {
      Enumerator(dir)
    }
  }

  def main(args: Array[String]) {
    val dir = new File("/david/test2")
    val res = listFiles(dir).run(Iteratee.foreach(exampleFn))
    Await.result(res, Duration.Inf)
  }
}

log:

listFiles. START: /david/test
processing /david/test
listFiles. START: /david/test/file1
processing /david/test/file1
listFiles. START: /david/test/file2
processing /david/test/file2
listFiles. START: /david/test/file3d
processing /david/test/file3d
listFiles. START: /david/test/file3d/file1
processing /david/test/file3d/file1
listFiles. START: /david/test/file3d/file2
processing /david/test/file3d/file2
listFiles. START: /david/test/file4
processing /david/test/file4


来源:https://stackoverflow.com/questions/36267374/scala-iteratee-to-recursively-process-files-and-subdirectories

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!