问题
I want to find all git repositories lying in some directory, but not its subdirectories, say ~/repo
. Two simple approaches are
find ~/repo -depth 2 -type d -name '.git' | while read repo …
or
for repo in ~/repo/*/.git …
The version using find is magnitudes slower than the one with the globbing pattern. I am very surprised by this, because there is no real reason why one method would need more system calls than the other to gather its informations. I tried a smarter version of the find invocation
find ~/repo -depth 3 -prune -o -depth 2 -type d -name '.git' -print | while read repo …
without any noticeable improvement. Unfortunately I was not able to trace system calls to figure out how find is working here.
What explains the huge speed difference between these two methods? (The shell is /bin/sh
which I believe to be some obsolete version of bash.)
回答1:
Update: the test -depth
with arguments (-depth 2
) is not specified in the documentation of GNU find. It is probably an OSX extension. Don't use it!
Use -mindepth 2 -maxdepth 2
instead, as suggested by @hek2mgl in their answer.
OSX specific
It seems the OSX version of find
unnecessarily descends into directories deeper than 2 levels when -depth 2
is used (but this is the correct behaviour, see below).
You can tell it to not do that by adding -prune
immediately after -depth 2
(it seems it doesn't have any effect if you put it somewhere else):
find ~/repo -depth 2 -prune -type d -name .git
Some benchmarks:
$ time (find . -depth 4 -prune -type d -name .git | wc -l)
20
real 0m0.064s
user 0m0.009s
sys 0m0.046s
Moved -prune
at the end and it suddenly needs a lot of time to run:
$ time (find . -depth 4 -type d -name .git -prune | wc -l)
20
real 0m12.726s
user 0m0.325s
sys 0m9.298s
Remarks
On a second thought (and after a closer reading of man find
) -depth 2
does not require find
to stop descending in directories deeper than two levels. It can be part of a more complex condition that requires -depth 2
or something else (f.e. find . -depth 2 -or -name .git
).
To force it to stop descending more than two levels you must use either -maxdepth 2
or -depth 2 -prune
.
-maxdepth
tells it to not go deeper than two levels;-depth 2 -prune
tells it to stop descending into subdirectories if the directory under examination is two levels deep.
They have equivalent behaviour, choosing one or another is a matter of preference. I would choose -maxdepth 2
because it is more clear.
Conclusion
Because -depth 2
is not portable, the final command should be like:
find ~/repo -mindepth 2 -maxdepth 2 -type d -name '.git' -print
Thanks @hek2mgl for mentioning about the compatibility issue.
回答2:
You can use:
find ~/repo -maxdepth 2 -mindepth 2 -type d -name '.git'
This would reproduce the logic of the globbing more exactly. Also note that the option isn't portable and will not work on GNU systems.
Btw, instead of piping into a while loop, I would use the -exec
option of find
.
来源:https://stackoverflow.com/questions/31045501/how-to-speed-up-find-for-listing-git-repositories