I have got 2 files. Let us call them md5s1.txt and md5s2.txt. Both contain the output of a
find -type f -print0 | xargs -0 md5sum | sort > md5s.txt
Compare only the md5 column using diff on <(cut -c -32 md5sums.sort.XXX), and tell diff to print just the line numbers of added or removed lines, using --old/new-line-format='%dn'$'\n'. Pipe this into ed md5sums.sort.XXX so it will print only those lines from the md5sums.sort.XXX file.
diff \
--new-line-format='%dn'$'\n' \
--old-line-format='' \
--unchanged-line-format='' \
<(cut -c -32 md5sums.sort.old) \
<(cut -c -32 md5sums.sort.new) \
| ed md5sums.sort.new \
> files-added
diff \
--new-line-format='' \
--old-line-format='%dn'$'\n' \
--unchanged-line-format='' \
<(cut -c -32 md5sums.sort.old) \
<(cut -c -32 md5sums.sort.new) \
| ed md5sums.sort.old \
> files-removed
The problem with ed is that it will load the entire file into memory, which can be a problem if you have a lot of checksums. Instead of piping the output of diff into ed, pipe it into the following command, which will use much less memory.
diff … | (
lnum=0;
while read lprint; do
while [ $lnum -lt $lprint ]; do read line <&3; ((lnum++)); done;
echo $line;
done
) 3