How to speed up find for listing git repositories?

妖精的绣舞 提交于 2019-12-24 17:19:14

问题


I want to find all git repositories lying in some directory, but not its subdirectories, say ~/repo. Two simple approaches are

find ~/repo -depth 2 -type d -name '.git' | while read repo …

or

for repo in ~/repo/*/.git …

The version using find is magnitudes slower than the one with the globbing pattern. I am very surprised by this, because there is no real reason why one method would need more system calls than the other to gather its informations. I tried a smarter version of the find invocation

find ~/repo -depth 3 -prune -o -depth 2 -type d -name '.git' -print | while read repo …

without any noticeable improvement. Unfortunately I was not able to trace system calls to figure out how find is working here.

What explains the huge speed difference between these two methods? (The shell is /bin/sh which I believe to be some obsolete version of bash.)


回答1:


Update: the test -depth with arguments (-depth 2) is not specified in the documentation of GNU find. It is probably an OSX extension. Don't use it!

Use -mindepth 2 -maxdepth 2 instead, as suggested by @hek2mgl in their answer.


OSX specific

It seems the OSX version of find unnecessarily descends into directories deeper than 2 levels when -depth 2 is used (but this is the correct behaviour, see below).

You can tell it to not do that by adding -prune immediately after -depth 2 (it seems it doesn't have any effect if you put it somewhere else):

find ~/repo -depth 2 -prune -type d -name .git

Some benchmarks:

$ time (find . -depth 4 -prune -type d -name .git | wc -l)
      20

real 0m0.064s
user 0m0.009s
sys  0m0.046s

Moved -prune at the end and it suddenly needs a lot of time to run:

$ time (find . -depth 4 -type d -name .git -prune | wc -l)
      20

real 0m12.726s
user 0m0.325s
sys  0m9.298s

Remarks

On a second thought (and after a closer reading of man find) -depth 2 does not require find to stop descending in directories deeper than two levels. It can be part of a more complex condition that requires -depth 2 or something else (f.e. find . -depth 2 -or -name .git).

To force it to stop descending more than two levels you must use either -maxdepth 2 or -depth 2 -prune.

  • -maxdepth tells it to not go deeper than two levels;
  • -depth 2 -prune tells it to stop descending into subdirectories if the directory under examination is two levels deep.

They have equivalent behaviour, choosing one or another is a matter of preference. I would choose -maxdepth 2 because it is more clear.

Conclusion

Because -depth 2 is not portable, the final command should be like:

find ~/repo -mindepth 2 -maxdepth 2 -type d -name '.git' -print

Thanks @hek2mgl for mentioning about the compatibility issue.




回答2:


You can use:

find ~/repo -maxdepth 2 -mindepth 2 -type d -name '.git'

This would reproduce the logic of the globbing more exactly. Also note that the option isn't portable and will not work on GNU systems.

Btw, instead of piping into a while loop, I would use the -exec option of find.



来源:https://stackoverflow.com/questions/31045501/how-to-speed-up-find-for-listing-git-repositories

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!