问题
I need to compare two directory structures with around one billion files each (directory deepness up to 20 levels)
I found usual diff -r /location/one /location/two
slow.
Is there any implementation of multithreading diff? Or is it doable via combining shell
and diff
together? If so, how?
回答1:
Your disk is gonna be the bottleneck.
Unless you are working on tmpfs, you will probably only loose speed. That said:
find -maxdepth 1 -type d -print0 |
xargs -0P4 -n1 -iDIRNAME diff -EwburqN "DIRNAME/" "/tmp/othertree/DIRNAME/"
should do a pretty decent job of comparing trees (in this case .
to /tmp/othertree
).
It has a flaw right now, in that it won't detect toplevel directories in otherthree
that don't exist in .
. I leave that as an exercise for the reader - though you could easily repeat the comparison in reverse
The argument -P4
to xargs specifies that you want at most 4 concurrent processes.
Also have look at the xjobs utitlity which does a better job at separating the output. I think with GNU xargs (like shown) you cannot drop the -q
option because it will intermix the diffs (?).
来源:https://stackoverflow.com/questions/7159921/diff-folders-recursively-vs-multithreading