diff folders recursively vs. multithreading

北战南征 提交于 2019-12-07 15:34:50

问题


I need to compare two directory structures with around one billion files each (directory deepness up to 20 levels)

I found usual diff -r /location/one /location/two slow.

Is there any implementation of multithreading diff? Or is it doable via combining shell and diff together? If so, how?


回答1:


Your disk is gonna be the bottleneck.

Unless you are working on tmpfs, you will probably only loose speed. That said:

find -maxdepth 1 -type d -print0 |
    xargs -0P4 -n1 -iDIRNAME diff -EwburqN "DIRNAME/" "/tmp/othertree/DIRNAME/"

should do a pretty decent job of comparing trees (in this case . to /tmp/othertree).

It has a flaw right now, in that it won't detect toplevel directories in otherthree that don't exist in .. I leave that as an exercise for the reader - though you could easily repeat the comparison in reverse

The argument -P4 to xargs specifies that you want at most 4 concurrent processes.

Also have look at the xjobs utitlity which does a better job at separating the output. I think with GNU xargs (like shown) you cannot drop the -q option because it will intermix the diffs (?).



来源:https://stackoverflow.com/questions/7159921/diff-folders-recursively-vs-multithreading

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!