问题
I have a a directory with a structure like so:
.
├── Test.txt
├── Test1
│ ├── Test1.txt
│ ├── Test1_copy.txt
│ └── Test1a
│ ├── Test1a.txt
│ └── Test1a_copy.txt
└── Test2
├── Test2.txt
├── Test2_copy.txt
└── Test2a
├── Test2a.txt
└── Test2a_copy.txt
I would like to create a bash script that makes a md5 checksum of every file in this directory. I want to be able to type the script name in the CLI and then the path to the directory I want to hash and have it work. I'm sure there are many ways to accomplish this. Currently I have:
#!/bin/bash
for file in "$1" ; do
md5 >> "${1}__checksums.md5"
done
This just hangs and it not working. Perhaps I should use find?
One caveat - the directories I want to hash will have files with different extensions and may not always have this exact same tree structure. I want something that will work in these different situations, as well.
回答1:
Using md5deep
md5deep -r path/to/dir > sums.md5
Using find
and md5sum
find relative/path/to/dir -type f -exec md5sum {} + > sums.md5
Be aware, that when you run check on your MD5 sums with md5sum -c sums.md5
, you need to run it from the same directory from which you generated sums.md5
file. This is because find
outputs paths that are relative to your current location, which are then put into sums.md5
file.
If this is a problem you can make relative/path/to/dir
absolute (e.g. by puting $PWD/
in front of your path). This way you can run check on sums.md5
from any location. Disadvantage is, that now sums.md5
contains absolute paths, which makes it bigger.
Fully featured function using find
and md5sum
You can put this function to your .bashrc
file (located in your $HOME
directory):
function md5sums {
if [ "$#" -lt 1 ]; then
echo -e "At least one parameter is expected\n" \
"Usage: md5sums [OPTIONS] dir"
else
local OUTPUT="checksums.md5"
local CHECK=false
local MD5SUM_OPTIONS=""
while [[ $# > 1 ]]; do
local key="$1"
case $key in
-c|--check)
CHECK=true
;;
-o|--output)
OUTPUT=$2
shift
;;
*)
MD5SUM_OPTIONS="$MD5SUM_OPTIONS $1"
;;
esac
shift
done
local DIR=$1
if [ -d "$DIR" ]; then # if $DIR directory exists
cd $DIR # change to $DIR directory
if [ "$CHECK" = true ]; then # if -c or --check option specified
md5sum --check $MD5SUM_OPTIONS $OUTPUT # check MD5 sums in $OUTPUT file
else # else
find . -type f ! -name "$OUTPUT" -exec md5sum $MD5SUM_OPTIONS {} + > $OUTPUT # Calculate MD5 sums for files in current directory and subdirectories excluding $OUTPUT file and save result in $OUTPUT file
fi
cd - > /dev/null # change to previous directory
else
cd $DIR # if $DIR doesn't exists, change to it to generate localized error message
fi
fi
}
After you run source ~/.bashrc
, you can use md5sums
like normal command:
md5sums path/to/dir
will generate checksums.md5
file in path/to/dir
directory, containing MD5 sums of all files in this directory and subdirectories. Use:
md5sums -c path/to/dir
to check sums from path/to/dir/checksums.md5
file.
Note that path/to/dir
can be relative or absolute, md5sums
will work fine either way. Resulting checksums.md5
file always contains paths relative to path/to/dir
.
You can use different file name then default checksums.md5
by supplying -o
or --output
option. All options, other then -c
, --check
, -o
and --output
are passed to md5sum
.
First half of md5sums
function definition is responsible for parsing options. See this answer for more information about it. Second half contains explanatory comments.
回答2:
How about:
find /path/you/need -type f -exec md5sum {} \; > checksums.md5
Update#1: Improved the command based on @twalberg's recommendation to handle white spaces in file names.
Update#2: Improved based on @jil's suggestion, to remove unnecessary xargs
call and use -exec
option of find instead.
Update#3: @Blake a naive implementation of your script would look something like this:
#!/bin/bash
# Usage: checksumchecker.sh <path>
find "$1" -type f -exec md5sum {} \; > "$1"__checksums.md5
回答3:
#!/bin/bash
shopt -s globstar
md5sum "$1"/** > "${1}__checksums.md5"
Explanation: shopt -s globstar
(manual) enables **
recursive glob wildcard. It will mean that "$1"/**
will expand to list of all the files recursively under the directory given as parameter $1
. Then the script simply calls md5sum
with this file list as parameter and > "${1}__checksums.md5"
redirects the output to the file.
回答4:
Updated Answer
If you like the answer below, or any of the others, you can make a function that does the command for you. So, to test it, type the following into Terminal to declare a function:
function sumthem(){ find "$1" -type f -print0 | parallel -0 -X md5 > checksums.md5; }
Then you can just use:
sumthem /Users/somebody/somewhere
If that works how you like, you can add that line to the end of your "bash profile" and the function will be declared and available whenever you are logged in. Your "bash profile" is probably in $HOME/.profile
Original Answer
Why not get all your CPU cores working in parallel for you?
find . -type f -print0 | parallel -0 -X md5sum
This finds all the files (-type f
) in the current directory (.
) and prints them with a null byte at the end. These are then passed passed into GNU Parallel, which is told that the filenames end with a null byte (-0
) and that it should do as many files as possible at a time (-X
) to save creating a new process for each file and it should md5sum the files.
This approach will pay the largest bonus, in terms off speed, with big images like Photoshop files.
回答5:
md5deep -r $your_directory | awk {'print $1'} | sort | md5sum | awk {'print $1'}
来源:https://stackoverflow.com/questions/36920307/md5-all-files-in-a-directory-tree