How to read data from nested zip files in Java without using temporary files?

感情迁移 提交于 2019-12-08 09:12:38

问题


I am trying to to extract files out of a nested zip archive and process them in memory.

What this question is not about:

  1. How to read a zip file in Java: NO, the question is how to read a zip file within a zip file within a zip and so on and so forth (as in nested zip files).

  2. Write temporary results on disk: NO, I'm asking about doing it all in memory. I found many answers using the not-so-efficient technique of writing results temporarily to disk, but that's not what I want to do.

Example:

Zipfile -> Zipfile1 -> Zipfile2 -> Zipfile3

Goal: extract the data found in each of the nested zip files, all in memory and using Java.

ZipFile is the answer, you say? NO, it is not, it works for the first iteration, that is for:

Zipfile -> Zipfile1

But once you get to Zipfile2, and perform a:

ZipInputStream z = new ZipInputStream(zipFile.getInputStream( zipEntry) ) ;

you will get a NullPointerException.

My code:

public class ZipHandler {

    String findings = new String();
    ZipFile zipFile = null;

    public void init(String fileName) throws AppException{

        try {
        //read file into stream
        zipFile = new ZipFile(fileName);  
        Enumeration<?> enu = zipFile.entries();  
        exctractInfoFromZip(enu);

        zipFile.close();
        } catch (FileNotFoundException e) {
        e.printStackTrace();

        } catch (IOException e) {
            e.printStackTrace();
    }
}

//The idea was recursively extract entries using ZipFile
public void exctractInfoFromZip(Enumeration<?> enu) throws IOException, AppException{   

    try {
        while (enu.hasMoreElements()) { 
            ZipEntry zipEntry = (ZipEntry) enu.nextElement();

            String name = zipEntry.getName();
            long size = zipEntry.getSize();
            long compressedSize = zipEntry.getCompressedSize();

            System.out.printf("name: %-20s | size: %6d | compressed size: %6d\n", 
                    name, size, compressedSize);

            // directory ?
            if (zipEntry.isDirectory()) {
                System.out.println("dir found:" + name);
                findings+=", " + name; 
                continue;
            } 

            if (name.toUpperCase().endsWith(".ZIP") ||  name.toUpperCase().endsWith(".GZ")) {
                String fileType = name.substring(
                        name.lastIndexOf(".")+1, name.length());

                System.out.println("File type:" + fileType);
                System.out.println("zipEntry: " + zipEntry);

                if (fileType.equalsIgnoreCase("ZIP")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip
                    ZipInputStream z = new ZipInputStream(zipFile.getInputStream(zipEntry) ) ;
                    System.out.println("Opening ZIP as stream: " + name);

                    findings+=", " + name;

                    exctractInfoFromZip(zipInputStreamToEnum(z));
                } else if (fileType.equalsIgnoreCase("GZ")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip      
                    GZIPInputStream z = new GZIPInputStream(zipFile.getInputStream(zipEntry) ) ;
                    System.out.println("Opening ZIP as stream: " + name);

                    findings+=", " + name;

                    exctractInfoFromZip(gZipInputStreamToEnum(z));
                } else
                    throw new AppException("extension not recognized!");
            } else {
                System.out.println(name);
                findings+=", " + name;
            }
        }
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    System.out.println("Findings " + findings);
} 

public Enumeration<?> zipInputStreamToEnum(ZipInputStream zStream) throws IOException{

    List<ZipEntry> list = new ArrayList<ZipEntry>();    

    while (zStream.available() != 0) {
        list.add(zStream.getNextEntry());
    }

    return Collections.enumeration(list);
} 

回答1:


I have not tried it but using ZipInputStream you can read any InputStream that contains a ZIP file as data. Iterate through the entries and when you found the correct entry use theZipInputStreamto create another nestedZipInputStream`.

The following code demonstrates this. Imagine we have a readme.txt inside 0.zip which is again zipped in 1.zip which is zipped in 2.zip. Now we read some text from readme.txt:

try (FileInputStream fin = new FileInputStream("D:/2.zip")) {
    ZipInputStream firstZip = new ZipInputStream(fin);
    ZipInputStream zippedZip = new ZipInputStream(findEntry(firstZip, "1.zip"));
    ZipInputStream zippedZippedZip = new ZipInputStream(findEntry(zippedZip, "0.zip"));

    ZipInputStream zippedZippedZippedReadme = findEntry(zippedZippedZip, "readme.txt");
    InputStreamReader reader = new InputStreamReader(zippedZippedZippedReadme);
    char[] cbuf = new char[1024];
    reader.read(cbuf);
    System.out.println(new String(cbuf));
    .....

public static ZipInputStream findEntry(ZipInputStream in, String name) throws IOException {
    ZipEntry entry = null;
    while ((entry = in.getNextEntry()) != null) {
        if (entry.getName().equals(name)) {
            return in;
        }
    }
    return null;
}

Note the code is really ugly and does not close anything nor does it checks for errors. It is just a minimized version that demonstrates how it works.

Theoretically there is no limit how many ZipInputStreams you cascade into another. The data is never written into a temporary file. The decryption is only performed on-demand when you read each InputStream.




回答2:


this is the way I found to unzip file in memory:

The code is not clean AT ALL, but i understand the rules are to post something working, so i have this hopefully to help so

What I do is use a recursive method to navigate the complex ZIP file and extract folder other inner zips files and save the results in memory to later work with them.

Main things I found I want to share with you:

1 ZipFile is useless if you have nested zip files 2 You have to use the basic Zip InputStream and Outputstream 3 I only use recursive programming to unzip nested zips

package course.hernan;

import java.io.BufferedInputStream;

import java.io.BufferedOutputStream;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayDeque;
import java.util.Deque;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import java.util.zip.ZipOutputStream;

import org.apache.commons.io.IOUtils;

public class FileReader {

private static final int  BUFFER_SIZE = 2048;

    public static void main(String[] args) {
        try {
            File f = new File("DIR/inputs.zip");
            FileInputStream fis = new FileInputStream(f);
            BufferedInputStream bis = new BufferedInputStream(fis);
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            BufferedOutputStream bos = new BufferedOutputStream(baos);
            byte[] buffer = new byte[BUFFER_SIZE];
            while (bis.read(buffer, 0, BUFFER_SIZE) != -1) {
               bos.write(buffer);
            }

            bos.flush();
            bos.close();
            bis.close();

            //This STACK has the output byte array information 
            Deque<Map<Integer, Object[]>> outputDataStack = ZipHandler1.unzip(baos);


        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}    
package course.hernan;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Deque;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.SortedMap;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

import org.apache.commons.lang3.StringUtils;

public class ZipHandler1 {

  private static final int BUFFER_SIZE = 2048;

  private static final String ZIP_EXTENSION = ".zip";
  public static final Integer FOLDER = 1;
  public static final Integer ZIP = 2;
  public static final Integer FILE = 3;


  public static Deque<Map<Integer, Object[]>> unzip(ByteArrayOutputStream zippedOutputFile) {

    try {

      ZipInputStream inputStream = new ZipInputStream(
          new BufferedInputStream(new ByteArrayInputStream(
              zippedOutputFile.toByteArray())));

      ZipEntry entry;

      Deque<Map<Integer, Object[]>> result = new ArrayDeque<Map<Integer, Object[]>>();

      while ((entry = inputStream.getNextEntry()) != null) {

        LinkedHashMap<Integer, Object[]> map = new LinkedHashMap<Integer, Object[]>();
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        System.out.println("\tExtracting entry: " + entry);
        int count;
        byte[] data = new byte[BUFFER_SIZE];

        if (!entry.isDirectory()) {
          BufferedOutputStream out = new BufferedOutputStream(
              outputStream, BUFFER_SIZE);

          while ((count = inputStream.read(data, 0, BUFFER_SIZE)) != -1) {
            out.write(data, 0, count);
          }

          out.flush();
          out.close();

          //  recursively unzip files
          if (entry.getName().toUpperCase().endsWith(ZIP_EXTENSION.toUpperCase())) {
            map.put(ZIP, new Object[] {entry.getName(), unzip(outputStream)});
            result.add(map);
            //result.addAll();
          } else { 
            map.put(FILE, new Object[] {entry.getName(), outputStream});
            result.add(map);
          }
        } else {
          map.put(FOLDER, new Object[] {entry.getName(), unzip(outputStream)});
          result.add(map);
        }
      }

      inputStream.close();

      return result;

    } catch (Exception e) {
      throw new RuntimeException(e);
    }
  }


来源:https://stackoverflow.com/questions/47208932/how-to-read-data-from-nested-zip-files-in-java-without-using-temporary-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!