The following bash script is slow when scanning for .git directories because it looks at every directory. If I have a collection of large repositories it takes a long time f
I've taken the time to copy-paste the script in your question, compare it to the script with your own answer. Here some interesting results:
Please note that:
git pull
by prefixing them with a echo
.ignore
file testing in the bash
solution.> /dev/null
here and there.pwd
calls in both.-prune
which is obviously lacking in the find
examplefind
examplebash
solution to NOT follow sym link to avoid cycles and behave as the find solution.shopt
to allow *
to expand to dotted directory names also to match find
solution's functionality.Thus, we are comparing, the find based solution:
#!/bin/bash
find . -name .git -type d -prune | while read d; do
cd $d/..
echo "$PWD >" git pull
cd $OLDPWD
done
With the bash shell builting solution:
#!/bin/bash
shopt -s dotglob
update() {
for d in "$@"; do
test -d "$d" -a \! -L "$d" || continue
cd "$d"
if [ -d ".git" ]; then
echo "$PWD >" git pull
else
update *
fi
cd ..
done
}
update *
Note: builtins (function
and the for
) are immune to MAX_ARGS OS limit for launching processes. So the *
won't break even on very large directories.
Technical differences between solutions:
The find based solution uses C function to crawl repository, it:
find
command.chdir
through several depth of sub-dir for each match and go back.chdir
once in the find command and once in the bash part.The bash based solution uses builtin (so near-C implementation, but interpreted) to crawl repository, note that it:
chdir
one level at a time.chdir
once for looking and performing the command.Actual speed results between solutions:
I have a working development collection of git repository on which I launched the scripts:
I have to admit that I wasn't prepared to see such a win from bash builtins. It became
more apparent and normal after doing the analysis of what's going on. To add insult to injuries, if you change the shell from /bin/bash
to /bin/sh
(you must comment out the shopt
line, and be prepared that it won't parse dotted directories), you'll fall to
~0.008s . Beat that !
Note that you can be more clever with the find solution by using:
find . -type d \( -exec /usr/bin/test -d "{}/.git" -a "{}" != "." \; -print -prune \
-o -name .git -prune \)
which will effectively remove crawling all sub-repository in a found git repository, at the price of spawning a process for each directory crawled. The final find solution I came with was around ~0.030s, which is more than twice faster than the previous find version, but remains 2 times slower than the bash solution.
Note that /usr/bin/test
is important to avoid search in $PATH
which costs time, and I needed -o -name .git -prune
and -a "{}" != "."
because my main repository was itself a git subrepository.
As a conclusion, I won't be using the bash builtin solution because it has too much corner cases for me (and my first test hit one of the limitation). But it was important for me to explain why it could be (much) faster in some cases, but find
solution seems much more robust and consistent to me.