问题
I have over 1/2 million files to hash over multiple folders An md5/crc hashing is taking too long some files are 1GB ~ 11Gb in size Im thinking of just hashing part of the file using head
So the below works when it comes to hashing finding and hashing everything.
find . -type f -exec sha1sum {} \;
Im just sure how to take this a step further and just do hash for the first say 256kB of the file e.g
find . -type f -exec head -c 256kB | sha1sum
Not sure if head is okay to use in this instance of would dd be better? The above command doesn't work so looking for ideas on how I can do this
I would like the output to be the same as what is seen in a native md5sum e.g in the below format (going to a text file)
<Hash> <file name>
Im not sure if the above is possible with a single line or will a for/do loop need to be used..... Performance is key using bash on RHEL6
回答1:
It is unclear where your limitation is. Do you have a slow disk or a slow CPU?
If your disk is not the limitation, you are probably limited by using a single core. GNU Parallel can help with that:
find . -type f | parallel -X sha256sum
If the limitation is disk I/O, then your idea of head makes perfect sense:
sha() {
tail -c 1M "$1" | sha256sum | perl -pe 'BEGIN{$a=shift} s/-/$a/' "$1";
}
export -f sha
find . -type f -print0 | parallel -0 -j10 --tag sha
The optimal value of -j10 depends on your disk system, so try adjusting it until you find the optimal value (which can be as low as -j1).
来源:https://stackoverflow.com/questions/28817057/md5-sha1-hashing-large-files