How do I clone a large Git repository on an unreliable connection?

问题

I want to clone LibreOffice. From the official website, this is what's written:

All our source code is hosted in git:

Clone: $ git clone git://anongit.freedesktop.org/libreoffice/core # (browse)

Clone (http): $ git clone http://anongit.freedesktop.org/git/libreoffice/core.git # slower

Tarballs: http://download.documentfoundation.org/libreoffice/src/

please find the latest versions (usually near the bottom)

now, when I write this command in git bash to clone, it starts fetching. But the repository is so big that after hours I lose connectivity for a few seconds, it rolls back the download, and I get nothing.

Is there any way I can download the repository smoothly even if interruptions occur?

P.S. I am a new user of Git and I use a 1 MB DSL internet connection. The repository must be over 1 GB.

回答1:

The repository is accessible via the http protocol (aka dumb protocol) here: http://anongit.freedesktop.org/git/libreoffice/core.git.

You can download everything here with wget or another download manager, and you'll have a clone of the repository. After that, you rename the directory from core.git to .git, and use the following command to tell git about the remote url:

$ git remote add remote http://anongit.freedesktop.org/git/libreoffice/core.git
$ git reset --hard HEAD

回答2:

do 'git clone --depth 100' It should grab the last 100 commits

回答3:

You can do the following:

git clone --depth 1 git@github.com:User/Project.git .
git fetch --unshallow

The first clone will still be atomic, so if your connection is not reliable enough to fetch the current HEAD then you will have trouble.

The subsequent fetch should be incremental and retryable if the connection drops half-way though.

回答4:

I used a my web hosting server with shell access to clone it first and then used rsync to copy it locally. rsync would copy only remaining files when resumed.

回答5:

The best method that I know of is to combine shallow clone (--depth 1) feature with sparse checkout, that is checking out only the subfolders or files that you need. (Shallow cloning also implies --single-branch, which is also useful.) See udondan's answer for an example.

Additionally, I use a bash loop to keep retrying until finished successfully. Like this:

#!/bin/bash

git init <repo_dir>
cd <repo_dir>
git remote add origin <repo_url>

# Optional step: sparse checkout
git config core.sparsecheckout true                     # <-- enable sparse checkout
echo "subdirectory/*" >> .git/info/sparse-checkout      # <-- specify files you need

# Keep pulling until successful
until $( git pull --depth=1 origin master ); do         # <-- shallow clone
    echo "Pulling git repository failed; retrying..."
done

In this way I can eventually pull large repos even with slow VPN in China…

Importantly, by pulling this way you will still be able to push.

来源：https://stackoverflow.com/questions/9268378/how-do-i-clone-a-large-git-repository-on-an-unreliable-connection

标签

git

bash

clone

git-clone