Calculate S3 ETag locally using spark md5

半世苍凉 提交于 2020-01-16 09:36:09

问题


I have uploaded a file 14MB to S3 in chunks (5MB) each and also using spark-md5 calculated the hash of each chunk. The individual hash of each chunk (generated by spark-md5) is matching with ETag of each chunk uploaded to S3.

But the ETag hash generated by doing full upload to S3 is not matching with locally calculated hash generated by spark-md5. Below are the steps for local hash:

  1. Generate hash (generated by spark-md5) of each chunk
  2. Join the hash of each chunk
  3. Convert to hex
  4. Calculated hash

Below is the code, please check if there is any mistake. Approach 1:

        var mergeChunk = self.chunkArray.join('');
        console.log("mergeChunk: " + mergeChunk);

        var hexString = toHexString(mergeChunk);
        console.log("toHexString: " + hexString);

        var cspark1 = SparkMD5.hash(hexString);
        console.log("SparkMD5 final hash: " + cspark1);

Approach 2:

       var mergeChunk = self.chunkArray.join('');
       console.log("mergeChunk: " + mergeChunk);
       var cspark2 = SparkMD5.hash(mergeChunk);
       console.log("SparkMD5 final hash: " + cspark2);

Please provide correct logic for calculating ETag.


回答1:


etags are meant to be opaque; AWS don't make any guarantees as to what to the tag of a multipart upload is.

I think it is just the cat of the blocks (in the order listed in the final POST), but you cannot rely on that.



来源:https://stackoverflow.com/questions/59633483/calculate-s3-etag-locally-using-spark-md5

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!