Calculate S3 object(folder) size in java

余生颓废 提交于 2020-01-01 15:09:16

问题


Im storing all types of files on amazon S3. In amazon S3 bucket, All files store in different folders, I know there is no concept of folder in amazon s3. Objects are only identified by their keys. if i store any file with key like 'mydocs/personal/profile-pic.jpg' that mean two parents folders(personal folder inside mydocs folder) will be created there.

I want to calculate the size of any folder like 'mydocs' in java. I calculated bucket total size by using this code given below:

public long calculateBucketSize(String bucketName) {
long totalSize = 0;
    int totalItems = 0;
    ObjectListing objects = listObjects(bucketName);
    do {
        for (S3ObjectSummary objectSummary : objects.getObjectSummaries()) {
            totalSize += objectSummary.getSize();
            totalItems++;
        }
        objects = listNextBatchOfObjects(objects);
    } while (objects.isTruncated());
    System.out.println("Amazon S3 bucket: " + bucketName + " containing "
            + totalItems + " objects with a total size of " + totalSize
            + " bytes.");

    return totalSize;
}

This method will return the bucket total size. I want to calculate the size of any single folder. Any help will be appreciated.


回答1:


There is an easy way to this with org.apache.hadoop lib

  def calculateSize(path: String)(implicit spark: SparkSession): Long = {
    val fsPath = new Path(path)
    val fs = fsPath.getFileSystem(spark.sparkContext.hadoopConfiguration)
    fs.getContentSummary(fsPath).getLength
  }

This function can calculate size in s3, hdfs and local file system




回答2:


For Scala developers, here it is recursive function to execute a full scan and map of the contents of an AmazonS3 bucket using the official AWS SDK for Java

import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}

def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {

  def scan(acc:List[T], listing:ObjectListing): List[T] = {
    val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
    val mapped = (for (summary <- summaries) yield f(summary)).toList

    if (!listing.isTruncated) mapped.toList
    else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
  }

  scan(List(), s3.listObjects(bucket, prefix))
}

To invoke the above curried map() function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f() you want to apply to map each object summary in the second parameter list.

For example

val tuple = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner, s.getSize))

will return the full list of (key, owner, size) tuples in that bucket/prefix

or

map(s3, "bucket", "prefix")(s => s.getSize).sum

will return the total size of that bucket/prefix contents

You can combine map() with many other functions as you would normally approach by Monads in Functional Programming




回答3:


I think you want to get size of folder at each level. Like if you have one root folder R-Folder and two sub folder S1.1-Folder, S1.2-Folder and then S1.1-Folder has again three subfolder S1.1.1-Folder, S1.1.2-Folder, S1.1.3-Folder. Now you want the folder size of each folder

R-Folder (32MB)
|__S1.1-Folder (22MB)
|  |__S1.1.1-Folder (7MB)
|  |__S1.1.2-Folder (5MB)
|  |__S1.1.3-Folder (10MB)
|
|__S1.2-FOlder (10MB)

Am I correct ?

You have to keep list folder details with status isCompleted or not - and scan each folder recursively. and when internal folder completed successfully then you have to update the size at its corresponding parent and that parent will update the to there corresponding parent and this will continue each time till root.




回答4:


Stucked in the same problem, the simple solution is using :

 ObjectListing objects = listObjects(bucketName,prefix);


Where prefix is your folder name.

For more information see this links:

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ObjectListing.html

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Client.html



来源:https://stackoverflow.com/questions/15950032/calculate-s3-objectfolder-size-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!