问题
I have uploaded a file 14MB to S3 in chunks (5MB) each and also using spark-md5 calculated the hash of each chunk. The individual hash of each chunk (generated by spark-md5) is matching with ETag of each chunk uploaded to S3.
But the ETag hash generated by doing full upload to S3 is not matching with locally calculated hash generated by spark-md5. Below are the steps for local hash:
- Generate hash (generated by spark-md5) of each chunk
- Join the hash of each chunk
- Convert to hex
- Calculated hash
Below is the code, please check if there is any mistake. Approach 1:
var mergeChunk = self.chunkArray.join('');
console.log("mergeChunk: " + mergeChunk);
var hexString = toHexString(mergeChunk);
console.log("toHexString: " + hexString);
var cspark1 = SparkMD5.hash(hexString);
console.log("SparkMD5 final hash: " + cspark1);
Approach 2:
var mergeChunk = self.chunkArray.join('');
console.log("mergeChunk: " + mergeChunk);
var cspark2 = SparkMD5.hash(mergeChunk);
console.log("SparkMD5 final hash: " + cspark2);
Please provide correct logic for calculating ETag.
回答1:
etags are meant to be opaque; AWS don't make any guarantees as to what to the tag of a multipart upload is.
I think it is just the cat of the blocks (in the order listed in the final POST), but you cannot rely on that.
来源:https://stackoverflow.com/questions/59633483/calculate-s3-etag-locally-using-spark-md5