How to get the md5sum of a file on Amazon's S3

前端 未结 11 2172

If I have existing files on Amazon\'s S3, what\'s the easiest way to get their md5sum without having to download the files?

Thanks

11条回答
  •  感动是毒
    2020-12-01 01:15

    This is a very old question, but I had a hard time find the information below, and this is one of the first places I could find, so I wanted to detail it in case anyone needs.

    ETag is a MD5. But for the Multipart uploaded files, the MD5 is computed from the concatenation of the MD5s of each uploaded part. So you don't need to compute the MD5 in the server. Just get the ETag and it's all.

    As @EmersonFarrugia said in this answer:

    Say you uploaded a 14MB file and your part size is 5MB. Calculate 3 MD5 checksums corresponding to each part, i.e. the checksum of the first 5MB, the second 5MB, and the last 4MB. Then take the checksum of their concatenation. Since MD5 checksums are hex representations of binary data, just make sure you take the MD5 of the decoded binary concatenation, not of the ASCII or UTF-8 encoded concatenation. When that's done, add a hyphen and the number of parts to get the ETag.

    So the only other things you need is the ETag and the upload part size. But the ETag has a -NumberOfParts suffix. So you can divide the size by the suffix and get part size. 5Mb is the minimum part size and the default value. The part size has to be integer, so you can't get things like 7,25Mb each part size. So it should be easy get the part size information.

    Here is a script to make this in osx, with a Linux version in comments: https://gist.github.com/emersonf/7413337

    I'll leave both script here in case the page above is no longer accessible in the future:

    Linux version:

    #!/bin/bash
    set -euo pipefail
    if [ $# -ne 2 ]; then
        echo "Usage: $0 file partSizeInMb";
        exit 0;
    fi
    file=$1
    if [ ! -f "$file" ]; then
        echo "Error: $file not found." 
        exit 1;
    fi
    partSizeInMb=$2
    fileSizeInMb=$(du -m "$file" | cut -f 1)
    parts=$((fileSizeInMb / partSizeInMb))
    if [[ $((fileSizeInMb % partSizeInMb)) -gt 0 ]]; then
        parts=$((parts + 1));
    fi
    checksumFile=$(mktemp -t s3md5.XXXXXXXXXXXXX)
    for (( part=0; part<$parts; part++ ))
    do
        skip=$((partSizeInMb * part))
        $(dd bs=1M count=$partSizeInMb skip=$skip if="$file" 2> /dev/null | md5sum >> $checksumFile)
    done
    etag=$(echo $(xxd -r -p $checksumFile | md5sum)-$parts | sed 's/ --/-/')
    echo -e "${1}\t${etag}"
    rm $checksumFile
    

    OSX version:

    #!/bin/bash
    
    if [ $# -ne 2 ]; then
        echo "Usage: $0 file partSizeInMb";
        exit 0;
    fi
    
    file=$1
    
    if [ ! -f "$file" ]; then
        echo "Error: $file not found." 
        exit 1;
    fi
    
    partSizeInMb=$2
    fileSizeInMb=$(du -m "$file" | cut -f 1)
    parts=$((fileSizeInMb / partSizeInMb))
    if [[ $((fileSizeInMb % partSizeInMb)) -gt 0 ]]; then
        parts=$((parts + 1));
    fi
    
    checksumFile=$(mktemp -t s3md5)
    
    for (( part=0; part<$parts; part++ ))
    do
        skip=$((partSizeInMb * part))
        $(dd bs=1m count=$partSizeInMb skip=$skip if="$file" 2>/dev/null | md5 >>$checksumFile)
    done
    
    echo $(xxd -r -p $checksumFile | md5)-$parts
    rm $checksumFile
    

提交回复
热议问题