How can I avoid mutable variables in Scala when using ZipInputStreams and ZipOutpuStreams?

问题

I'm trying to read a zip file, check that it has some required files, and then write all valid files out to another zip file. The basic introduction to java.util.zip has a lot of Java-isms and I'd love to make my code more Scala-native. Specifically, I'd like to avoid the use of vars. Here's what I have:

val fos = new FileOutputStream("new.zip");
val zipOut = new ZipOutputStream(new BufferedOutputStream(fos));

while (zipIn.available == 1) {
  val entry = zipIn.getNextEntry
  if (entryIsValid(entry)) {
    zipOut.putNewEntry(new ZipEntry("subdir/" + entry.getName())
    // read data into the data Array
    var data = Array[Byte](1024)
    var count = zipIn.read(data, 0, 1024)
    while (count != -1) {
      zipOut.write(data, 0, count)
      count = zipIn.read(data, 0, 1024)
    }
  }
  zipIn.close
}
zipOut.close

I should add that I'm using Scala 2.7.7.

回答1:

dI don't think there's anything particularly wrong with using Java classes that are designed to work in imperative fashion in the fashion they were designed. Idiomatic Scala includes being able to use idiomatic Java as it was intended, even if the styles do clash a bit.

However, if you want--perhaps as an exercise, or perhaps because it does slightly clarify the logic--to do this in a more functional var-free way, you can do so. In 2.8, it's particularly nice, so even though you're using 2.7.7, I'll give a 2.8 answer.

First, we need to set up the problem, which you didn't entirely, but let's suppose we have something like this:

import java.io._
import java.util.zip._
import scala.collection.immutable.Stream

val fos = new FileOutputStream("new.zip")
val zipOut = new ZipOutputStream(new BufferedOutputStream(fos))
val zipIn = new ZipInputStream(new FileInputStream("old.zip"))
def entryIsValid(ze: ZipEntry) = !ze.isDirectory

Now, given this we want to copy the zip file. The trick we can use is the continually method in collection.immutable.Stream. What it does is perform a lazily-evaluated loop for you. You can then take and filter the results to terminate and process what you want. It's a handy pattern to use when you have something that you want to be an iterator, but it isn't. (If the item updates itself you can use .iterate in Iterable or Iterator--that's usually even better.) Here's the application to this case, used twice: once to get the entries, and once to read/write chunks of data:

val buffer = new Array[Byte](1024)
Stream.continually(zipIn.getNextEntry).
  takeWhile(_ != null).filter(entryIsValid).
  foreach(entry => {
    zipOut.putNextEntry(new ZipEntry("subdir/"+entry.getName))
    Stream.continually(zipIn.read(buffer)).takeWhile(_ != -1).
      foreach(count => zipOut.write(buffer,0,count))
  })
}
zipIn.close
zipOut.close

Pay close attention to the . at the end of some lines! I would normally write this on one long line, but it's nicer to have it wrap so you can see it all here.

Just in case it isn't clear, let's unpack one of the uses of continually.

Stream.continually(zipIn.read(buffer))

This asks to keep calling zipIn.read(buffer) for as many times as necessary, storing the integer that results.

.takeWhile(_ != -1)

This specifies how many times are necessary, returning a stream of indefinite length but which will quit when it hits a -1.

.foreach(count => zipOut.write(buffer,0,count))

This processes the stream, taking each item in turn (the count), and using it to write the buffer. This works in a slightly sneaky way, since you rely upon the fact that zipIn has just been called to get the next element of the stream--if you tried to do this again, not on a single pass through the stream, it would fail because buffer would be overwritten. But here it's okay.

So, there it is: a slightly more compact, possibly easier to understand, possibly less easy to understand method that is more functional (though there are still side-effects galore). In 2.7.7, in contrast, I would actually do it the Java way because Stream.continually isn't available, and the overhead of building a custom Iterator isn't worth it for this one case. (It would be worth it if I was going to do more zip file processing and could reuse the code, however.)

Edit: The looking-for-available-to-go-zero method is kind of flaky for detecting the end of the zip file. I think the "correct" way is to wait until you get a null back from getNextEntry. With that in mind, I've edited the previous code (there was a takeWhile(_ => zipIn.available==1) that is now a takeWhile(_ != null)) and provided a 2.7.7 iterator based version below (note how small the main loop is, once you get through the work of defining the iterators, which do admittedly use vars):

val buffer = new Array[Byte](1024)
class ZipIter(zis: ZipInputStream) extends Iterator[ZipEntry] {
  private var entry:ZipEntry = zis.getNextEntry
  private var cached = true
  private def cache { if (entry != null && !cached) {
    cached = true; entry = zis.getNextEntry
  }}
  def hasNext = { cache; entry != null }
  def next = {
    if (!cached) cache
    cached = false
    entry
  }
}
class DataIter(is: InputStream, ab: Array[Byte]) extends Iterator[(Int,Array[Byte])] {
  private var count = 0
  private var waiting = false
  def hasNext = { 
    if (!waiting && count != -1) { count = is.read(ab); waiting=true }
    count != -1
  }
  def next = { waiting=false; (count,ab) }
}
(new ZipIter(zipIn)).filter(entryIsValid).foreach(entry => {
  zipOut.putNextEntry(new ZipEntry("subdir/"+entry.getName))
  (new DataIter(zipIn,buffer)).foreach(cb => zipOut.write(cb._2,0,cb._1))
})
zipIn.close
zipOut.close

回答2:

Using scala2.8 and tail recursive call :

def copyZip(in: ZipInputStream, out: ZipOutputStream, bufferSize: Int = 1024) {
  val data = new Array[Byte](bufferSize)

  def copyEntry() {
    in getNextEntry match {
      case null =>
      case entry => {
        if (entryIsValid(entry)) {
          out.putNextEntry(new ZipEntry("subdir/" + entry.getName()))

          def copyData() {
            in read data match {
              case -1 =>
              case count => {
                out.write(data, 0, count)
                copyData()
              }
            }
          }
          copyData()
        }
        copyEntry()
      }
    }
  }
  copyEntry()
}

回答3:

I'd try something like this (yes, pretty much the same idea sblundy had):

Iterator.continually {
  val data = new Array[Byte](100)
  zipIn.read(data) match {
    case -1 => Array.empty[Byte]
    case 0  => new Array[Byte](101) // just to filter it out
    case n  => java.util.Arrays.copyOf(data, n)
  }
} filter (_.size != 101) takeWhile (_.nonEmpty)

It could be simplified like below, but I'm not very fond of it. I'd prefer for read not to be able to return 0...

Iterator.continually {
  val data = new Array[Byte](100)
  zipIn.read(data) match {
    case -1 => new Array[Byte](101)
    case n  => java.util.Arrays.copyOf(data, n)
  }
} takeWhile (_.size != 101)

回答4:

Based on http://harrah.github.io/browse/samples/compiler/scala/tools/nsc/io/ZipArchive.scala.html:

private[io] class ZipEntryTraversableClass(in: InputStream) extends Traversable[ZipEntry] {
  val zis = new ZipInputStream(in)

  def foreach[U](f: ZipEntry => U) {
    @tailrec
    def loop(x: ZipEntry): Unit = if (x != null) {
      f(x)
      zis.closeEntry()
      loop(zis.getNextEntry())
    }
    loop(zis.getNextEntry())
  }

  def writeCurrentEntryTo(os: OutputStream) {
    IOUtils.copy(zis, os)
  }
}

回答5:

Without tail-recursion, I'd avoid recursion. You would run the risk to a stack overflow. You could wrap zipIn.read(data) in an scala.BufferedIterator[Byte] and go from there.

来源：https://stackoverflow.com/questions/2849303/how-can-i-avoid-mutable-variables-in-scala-when-using-zipinputstreams-and-zipout

标签

scala

immutability

zipinputstream

zipoutputstream