Is there any way to upload extracted zip file using “java.util.zip” to AWS-S3 using multipart upload (Java high level API)

问题

Need to upload a large file to AWS S3 using multipart-upload using stream instead of using /tmp of lambda.The file is uploaded but not uploading completely.

In my case the size of each file in zip cannot be predicted, may be a file goes up to 1 Gib of size.So I used ZipInputStream to read from S3 and I want to upload it back to S3.Since I am working on lambda, I cannot save the file in /tmp of lambda due to the large file size.So I tried to read and upload directly to S3 without saving in /tmp using S3-multipart upload. But I faced an issue that the file is not writing completely.I suspect that the file is overwritten every time. Please review my code and help.

public void zipAndUpload {
    byte[] buffer = new byte[1024];
    try{
    File folder = new File(outputFolder);
    if(!folder.exists()){
        folder.mkdir();
    }

    AmazonS3 s3Client = AmazonS3ClientBuilder.defaultClient();  
    S3Object object = s3Client.getObject("mybucket.s3.com","MyFilePath/MyZip.zip");

    TransferManager tm = TransferManagerBuilder.standard()
            .withS3Client(s3Client)
            .build();

    ZipInputStream zis = 
        new ZipInputStream(object.getObjectContent());
    ZipEntry ze = zis.getNextEntry();

    while(ze!=null){    
    String fileName = ze.getName();
    System.out.println("ZE " + ze + " : " + fileName);

          File newFile = new File(outputFolder + File.separator + fileName);
          if (ze.isDirectory()) {
              System.out.println("DIRECTORY" + newFile.mkdirs());
          }
          else {
              filePaths.add(newFile);
              int len;
              while ((len = zis.read(buffer)) > 0) {

                  ObjectMetadata meta = new ObjectMetadata();
                  meta.setContentLength(len);
                  InputStream targetStream = new ByteArrayInputStream(buffer);

                  PutObjectRequest request = new PutObjectRequest("mybucket.s3.com", fileName, targetStream ,meta); 
                  request.setGeneralProgressListener(new ProgressListener() {
                      public void progressChanged(ProgressEvent progressEvent) {
                          System.out.println("Transferred bytes: " + progressEvent.getBytesTransferred());
                      }
                  });
                  Upload upload = tm.upload(request);
                 }
          }  
           ze = zis.getNextEntry();
    }

       zis.closeEntry();
       zis.close(); 
       System.out.println("Done");  
   }catch(IOException ex){
      ex.printStackTrace(); 
   }
    }

回答1:

The problem is your inner while loop. Basically you're reading 1024 bytes from the ZipInputStream and upload those into S3. Instead of streaming into S3, you will overwrite the target key again and again and again.

The solution to this is a bit more complex because you don't have one stream per file but one stream per zip container. This means you can't do something like below because the stream will be closed by AWS after the first upload is done

// Not possible
PutObjectRequest request = new PutObjectRequest(targetBucket, name, 
zipInputStream, meta);

You have to write the ZipInputStream into a PipedOutputStream object - for each of the ZipEntry positions. Below is a working example

import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.GetObjectRequest;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.services.s3.model.PutObjectRequest;
import com.amazonaws.services.s3.model.S3Object;
import com.amazonaws.services.s3.transfer.TransferManager;
import com.amazonaws.services.s3.transfer.TransferManagerBuilder;

import java.io.*;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public class Pipes {
    public static void main(String[] args) throws IOException {

        Regions clientRegion = Regions.DEFAULT;
        String sourceBucket = "<sourceBucket>";
        String key = "<sourceArchive.zip>";
        String targetBucket = "<targetBucket>";

        PipedOutputStream out = null;
        PipedInputStream in = null;
        S3Object s3Object = null;
        ZipInputStream zipInputStream = null;

        try {
            AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
                    .withRegion(clientRegion)
                    .withCredentials(new ProfileCredentialsProvider())
                    .build();

            TransferManager transferManager = TransferManagerBuilder.standard()
                    .withS3Client(s3Client)
                    .build();

            System.out.println("Downloading an object");
            s3Object = s3Client.getObject(new GetObjectRequest(sourceBucket, key));
            zipInputStream = new ZipInputStream(s3Object.getObjectContent());

            ZipEntry zipEntry;
            while (null != (zipEntry = zipInputStream.getNextEntry())) {

                long size = zipEntry.getSize();
                String name = zipEntry.getName();
                if (zipEntry.isDirectory()) {
                    System.out.println("Skipping directory " + name);
                    continue;
                }

                System.out.printf("Processing ZipEntry %s : %d bytes\n", name, size);

                // take the copy of the stream and re-write it to an InputStream
                out = new PipedOutputStream();
                in = new PipedInputStream(out);

                ObjectMetadata metadata = new ObjectMetadata();
                metadata.setContentLength(size);

                PutObjectRequest request = new PutObjectRequest(targetBucket, name, in, metadata);

                transferManager.upload(request);

                long actualSize = copy(zipInputStream, out, 1024);
                if (actualSize != size) {
                    throw new RuntimeException("Filesize of ZipEntry " + name + " is wrong");
                }

                out.flush();
                out.close();
            }
        } finally {
            if (out != null) {
                out.close();
            }
            if (in != null) {
                in.close();
            }
            if (s3Object != null) {
                s3Object.close();
            }
            if (zipInputStream != null) {
                zipInputStream.close();
            }
            System.exit(0);
        }
    }

    private static long copy(final InputStream input, final OutputStream output, final int buffersize) throws IOException {
        if (buffersize < 1) {
            throw new IllegalArgumentException("buffersize must be bigger than 0");
        }
        final byte[] buffer = new byte[buffersize];
        int n = 0;
        long count=0;
        while (-1 != (n = input.read(buffer))) {
            output.write(buffer, 0, n);
            count += n;
        }
        return count;
    }
}

回答2:

I am trying to understand why this code snippet is not working

            @Cleanup PipedOutputStream out = new PipedOutputStream();
            @Cleanup PipedInputStream in = new PipedInputStream(out);

            long actualSize = copy(zipInputStream, out);

            ObjectMetadata metadata = new ObjectMetadata();
            metadata.setContentLength(actualSize);

            executorService.submit(() -> {
                Upload upload = transferManager.upload(destBucketName, name, in, metadata);
                try {
                    upload.waitForCompletion();
                } catch (InterruptedException e) {
                 // error
                }
            });
            out.flush();
            out.close();

来源：https://stackoverflow.com/questions/58497808/is-there-any-way-to-upload-extracted-zip-file-using-java-util-zip-to-aws-s3-us

标签

java

amazon-s3

zipinputstream