问题
Im storing all types of files on amazon S3. In amazon S3 bucket, All files store in different folders, I know there is no concept of folder in amazon s3. Objects are only identified by their keys. if i store any file with key like 'mydocs/personal/profile-pic.jpg' that mean two parents folders(personal folder inside mydocs folder) will be created there.
I want to calculate the size of any folder like 'mydocs' in java. I calculated bucket total size by using this code given below:
public long calculateBucketSize(String bucketName) {
long totalSize = 0;
int totalItems = 0;
ObjectListing objects = listObjects(bucketName);
do {
for (S3ObjectSummary objectSummary : objects.getObjectSummaries()) {
totalSize += objectSummary.getSize();
totalItems++;
}
objects = listNextBatchOfObjects(objects);
} while (objects.isTruncated());
System.out.println("Amazon S3 bucket: " + bucketName + " containing "
+ totalItems + " objects with a total size of " + totalSize
+ " bytes.");
return totalSize;
}
This method will return the bucket total size. I want to calculate the size of any single folder. Any help will be appreciated.
回答1:
There is an easy way to this with org.apache.hadoop lib
def calculateSize(path: String)(implicit spark: SparkSession): Long = {
val fsPath = new Path(path)
val fs = fsPath.getFileSystem(spark.sparkContext.hadoopConfiguration)
fs.getContentSummary(fsPath).getLength
}
This function can calculate size in s3, hdfs and local file system
回答2:
For Scala developers, here it is recursive function to execute a full scan and map of the contents of an AmazonS3 bucket using the official AWS SDK for Java
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}
def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {
def scan(acc:List[T], listing:ObjectListing): List[T] = {
val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
val mapped = (for (summary <- summaries) yield f(summary)).toList
if (!listing.isTruncated) mapped.toList
else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
}
scan(List(), s3.listObjects(bucket, prefix))
}
To invoke the above curried map() function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f() you want to apply to map each object summary in the second parameter list.
For example
val tuple = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner, s.getSize))
will return the full list of (key, owner, size) tuples in that bucket/prefix
or
map(s3, "bucket", "prefix")(s => s.getSize).sum
will return the total size of that bucket/prefix contents
You can combine map() with many other functions as you would normally approach by Monads in Functional Programming
回答3:
I think you want to get size of folder at each level. Like if you have one root folder R-Folder and two sub folder S1.1-Folder, S1.2-Folder and then S1.1-Folder has again three subfolder S1.1.1-Folder, S1.1.2-Folder, S1.1.3-Folder. Now you want the folder size of each folder
R-Folder (32MB)
|__S1.1-Folder (22MB)
| |__S1.1.1-Folder (7MB)
| |__S1.1.2-Folder (5MB)
| |__S1.1.3-Folder (10MB)
|
|__S1.2-FOlder (10MB)
Am I correct ?
You have to keep list folder details with status isCompleted or not - and scan each folder recursively. and when internal folder completed successfully then you have to update the size at its corresponding parent and that parent will update the to there corresponding parent and this will continue each time till root.
回答4:
Stucked in the same problem, the simple solution is using :
ObjectListing objects = listObjects(bucketName,prefix);
Where prefix is your folder name.
For more information see this links:
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ObjectListing.html
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Client.html
来源:https://stackoverflow.com/questions/15950032/calculate-s3-objectfolder-size-in-java