How to quickly find all git repos under a directory

前端 未结 8 1464
暖寄归人
暖寄归人 2020-12-12 13:14

The following bash script is slow when scanning for .git directories because it looks at every directory. If I have a collection of large repositories it takes a long time f

相关标签:
8条回答
  • 2020-12-12 13:27

    Check out the answer using the locate command: Is there any way to list up git repositories in terminal?

    The advantages of using locate instead of a custom script are:

    1. The search is indexed, so it scales
    2. It does not require the use (and maintenance) of a custom bash script

    The disadvantages of using locate are:

    1. The db that locate uses is updated weekly, so freshly-created git repositories won't show up

    Going the locate route, here's how to list all git repositories under a directory, for OS X:

    Enable locate indexing (will be different on Linux):

    sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.locate.plist
    

    Run this command after indexing completes (might need some tweaking for Linux):

    repoBasePath=$HOME
    locate '.git' | egrep '.git$' | egrep "^$repoBasePath" | xargs -I {} dirname "{}"
    
    0 讨论(0)
  • 2020-12-12 13:34

    For windows, you can put the following into a batch file called gitlist.bat and put it on your PATH.

    @echo off
    if {%1}=={} goto :usage
    for /r %1 /d %%I in (.) do echo %%I | find ".git\."
    goto :eof
    :usage
    echo usage: gitlist ^<path^>
    
    0 讨论(0)
  • 2020-12-12 13:41

    The answers above all rely on finding a ".git" repository. However not all git repos have these (e.g. bare repos). The following command will loop through all directories and ask git if it considers each to be a directory. If so, it prunes sub dirs off the tree and continues.

    find . -type d -exec sh -c 'cd "{}"; git rev-parse --git-dir 2> /dev/null 1>&2' \; -prune -print
    

    It's a lot slower than other solutions because it's executing a command in each directory, but it doesn't rely on a particular repository structure. Could be useful for finding bare git repositories for example.

    0 讨论(0)
  • 2020-12-12 13:45

    I've taken the time to copy-paste the script in your question, compare it to the script with your own answer. Here some interesting results:

    Please note that:

    • I've disabled the git pull by prefixing them with a echo
    • I've removed also the color things
    • I've removed also the .ignore file testing in the bash solution.
    • And removed the unecessary > /dev/null here and there.
    • removed pwd calls in both.
    • added -prune which is obviously lacking in the find example
    • used "while" instead of "for" which was also counter productive in the find example
    • considerably untangled the second example to get to the point.
    • added a test on the bash solution to NOT follow sym link to avoid cycles and behave as the find solution.
    • added shopt to allow * to expand to dotted directory names also to match find solution's functionality.

    Thus, we are comparing, the find based solution:

    #!/bin/bash
    
    find . -name .git -type d -prune | while read d; do
       cd $d/..
       echo "$PWD >" git pull
       cd $OLDPWD
    done
    

    With the bash shell builting solution:

    #!/bin/bash
    
    shopt -s dotglob
    
    update() {
        for d in "$@"; do
            test -d "$d" -a \! -L "$d" || continue
            cd "$d"
            if [ -d ".git" ]; then
                echo "$PWD >" git pull
            else
                update *
            fi
            cd ..
        done
    }
    
    update *
    

    Note: builtins (function and the for) are immune to MAX_ARGS OS limit for launching processes. So the * won't break even on very large directories.

    Technical differences between solutions:

    The find based solution uses C function to crawl repository, it:

    • has to load a new process for the find command.
    • will avoid ".git" content but will crawl workdir of git repositories, and loose some times in those (and eventually find more matching elements).
    • will have to chdir through several depth of sub-dir for each match and go back.
    • will have to chdir once in the find command and once in the bash part.

    The bash based solution uses builtin (so near-C implementation, but interpreted) to crawl repository, note that it:

    • will use only one process.
    • will avoid git workdir subdirectory.
    • will only perform chdir one level at a time.
    • will only perform chdir once for looking and performing the command.

    Actual speed results between solutions:

    I have a working development collection of git repository on which I launched the scripts:

    • find solution: ~0.080s (bash chdir takes ~0.010s)
    • bash solution: ~0.017s

    I have to admit that I wasn't prepared to see such a win from bash builtins. It became more apparent and normal after doing the analysis of what's going on. To add insult to injuries, if you change the shell from /bin/bash to /bin/sh (you must comment out the shopt line, and be prepared that it won't parse dotted directories), you'll fall to ~0.008s . Beat that !

    Note that you can be more clever with the find solution by using:

    find . -type d \( -exec /usr/bin/test -d "{}/.git" -a "{}" != "." \; -print -prune \
           -o -name .git -prune \)
    

    which will effectively remove crawling all sub-repository in a found git repository, at the price of spawning a process for each directory crawled. The final find solution I came with was around ~0.030s, which is more than twice faster than the previous find version, but remains 2 times slower than the bash solution.

    Note that /usr/bin/test is important to avoid search in $PATH which costs time, and I needed -o -name .git -prune and -a "{}" != "." because my main repository was itself a git subrepository.

    As a conclusion, I won't be using the bash builtin solution because it has too much corner cases for me (and my first test hit one of the limitation). But it was important for me to explain why it could be (much) faster in some cases, but find solution seems much more robust and consistent to me.

    0 讨论(0)
  • 2020-12-12 13:45

    I list all git repositories anywhere in the current directory using:

    find . -type d -execdir test -d {}/.git \\; -prune -print
    

    This is fast since it stops recursing once it finds a git repository. (Although it does not handle bare repositories.) Of course, you can change the . to whatever directory you want. If you need, you can change the -print to -print0 for null-separated values.

    To also ignore directories containing a .ignore file:

    find . -type d \( -execdir test -e {}/.ignore \; -prune \) -o \( -execdir test -d {}/.git \; -prune -print \)
    

    I've added this alias to my ~/.gitconfig file:

    [alias]
      repos =  !"find -type d -execdir test -d {}/.git \\; -prune -print"
    

    Then I just need to execute:

    git repos
    

    To get a complete listing of all the git repositories anywhere in my current directory.

    0 讨论(0)
  • 2020-12-12 13:47

    Here is an optimized solution:

    #!/bin/bash
    # Update all git directories below current directory or specified directory
    # Skips directories that contain a file called .ignore
    
    HIGHLIGHT="\e[01;34m"
    NORMAL='\e[00m'
    
    function update {
      local d="$1"
      if [ -d "$d" ]; then
        if [ -e "$d/.ignore" ]; then 
          echo -e "\n${HIGHLIGHT}Ignoring $d${NORMAL}"
        else
          cd $d > /dev/null
          if [ -d ".git" ]; then
            echo -e "\n${HIGHLIGHT}Updating `pwd`$NORMAL"
            git pull
          else
            scan *
          fi
          cd .. > /dev/null
        fi
      fi
      #echo "Exiting update: pwd=`pwd`"
    }
    
    function scan {
      #echo "`pwd`"
      for x in $*; do
        update "$x"
      done
    }
    
    if [ "$1" != "" ]; then cd $1 > /dev/null; fi
    echo -e "${HIGHLIGHT}Scanning ${PWD}${NORMAL}"
    scan *
    
    0 讨论(0)
提交回复
热议问题