How to clone all repos at once from GitHub?

问题

I have a company GitHub account and I want to back up all of the repositories within, accounting for anything new that might get created for purposes of automation. I was hoping something like this:

git clone git@github.com:company/*.git

or similar would work, but it doesn\'t seem to like the wildcard there.

Is there a way in Git to clone and then pull everything assuming one has the appropriate permissions?

回答1:

I don't think it's possible to do it that way. Your best bet is to find and loop through a list of an Organization's repositories using the API.

Try this:

Create an API token by going to Account Settings -> Applications
Make a call to: http://${GITHUB_BASE_URL}/api/v3/orgs/${ORG_NAME}/repos?access_token=${ACCESS_TOKEN}
The response will be a JSON array of objects. Each object will include information about one of the repositories under that Organization. I think in your case, you'll be looking specifically for the ssh_url property.
Then git clone each of those ssh_urls.

It's a little bit of extra work, but it's necessary for GitHub to have proper authentication.

回答2:

On Windows and all UNIX/LINUX systems, using Git Bash or any other Terminal, replace YOURUSERNAME by your username and use:

CNTX={users|orgs}; NAME={username|orgname}; PAGE=1
curl "https://api.github.com/$CNTX/$NAME/repos?page=$PAGE&per_page=100" |
  grep -e 'git_url*' |
  cut -d \" -f 4 |
  xargs -L1 git clone

Set CNTX=users and NAME=yourusername, to download all your repositories. Set CNTX=orgs and NAME=yourorgname, to download all repositories of your organization.

The maximum page-size is 100, so you have to call this several times with the right page number to get all your repositories (set PAGE to the desired page number you want to download).

Here is a shell script that does the above: https://gist.github.com/erdincay/4f1d2e092c50e78ae1ffa39d13fa404e

回答3:

Organisation repositories

To clone all repos from your organisation, try the following shell one-liner:

GHORG=company; curl "https://api.github.com/orgs/$GHORG/repos?per_page=1000" | grep -o 'git@[^"]*' | xargs -L1 git clone

User repositories

Cloning all using Git repository URLs:

GHUSER=CHANGEME; curl "https://api.github.com/users/$GHUSER/repos?per_page=1000" | grep -o 'git@[^"]*' | xargs -L1 git clone

Cloning all using Clone URL:

GHUSER=CHANGEME; curl "https://api.github.com/users/$GHUSER/repos?per_page=1000" | grep -w clone_url | grep -o '[^"]\+://.\+.git' | xargs -L1 git clone

Here is the useful shell function which can be added to user's startup files (using curl + jq):

# Usage: gh-clone-user (user)
gh-clone-user() {
  curl -sL "https://api.github.com/users/$1/repos?per_page=1000" | jq -r '.[]|.clone_url' | xargs -L1 git clone
}

Private repositories

If you need to clone the private repos, you can add Authorization token either in your header like:

-H 'Authorization: token <token>'

or pass it in the param (?access_token=TOKEN), for example:

curl -s "https://api.github.com/users/$GHUSER/repos?access_token=$GITHUB_API_TOKEN&per_page=1000" | grep -w clone_url | grep -o '[^"]\+://.\+.git' | xargs -L1 git clone

Notes:

To fetch only private repositories, add type=private into your query string.
Another way is to use hub after configuring your API key.

回答4:

This gist accomplishes the task in one line on the command line:

curl -s https://api.github.com/orgs/[your_org]/repos?per_page=200 | ruby -rubygems -e 'require "json"; JSON.load(STDIN.read).each { |repo| %x[git clone #{repo["ssh_url"]} ]}'

Replace [your_org] with your organization's name. And set your per_page if necessary.

UPDATE:

As ATutorMe mentioned, the maximum page size is 100, according to the GitHub docs.

If you have more than 100 repos, you'll have to add a page parameter to your url and you can run the command for each page.

curl -s "https://api.github.com/orgs/[your_org]/repos?page=2&per_page=100" | ruby -rubygems -e 'require "json"; JSON.load(STDIN.read).each { |repo| %x[git clone #{repo["ssh_url"]} ]}'

Note: The default per_page parameter is 30.

回答5:

Go to Account Settings -> Application and create an API key
Then insert the API key, github instance url, and organization name in the script below

#!/bin/bash

# Substitute variables here
ORG_NAME="<ORG NAME>"
ACCESS_TOKEN="<API KEY>"
GITHUB_INSTANCE="<GITHUB INSTANCE>

URL="https://${GITHUB_INSTANCE}/api/v3/orgs/${ORG_NAME}/repos?access_token=${ACCESS_TOKEN}"

curl ${URL} | ruby -rjson -e 'JSON.load(STDIN.read).each {|repo| %x[git clone #{repo["ssh_url"]} ]}'

Save that in a file, chmod u+x the file, then run it.

_{Thanks to Arnaud for the ruby code.}

回答6:

So, I will add my answer too. :) (I found it's simple)

Fetch list (I've used "magento" company):

curl -si https://api.github.com/users/magento/repos | grep ssh_url | cut -d '"' -f4

Use clone_url instead ssh_url to use HTTP access.

So, let's clone them all! :)

curl -si https://api.github.com/users/magento/repos | \
    grep ssh_url | cut -d '"' -f4 | xargs -i git clone {}

If you are going to fetch private repo's - just add GET parameter ?access_token=YOURTOKEN

回答7:

I found a comment in the gist @seancdavis provided to be very helpful, especially because like the original poster, I wanted to sync all the repos for quick access, however the vast majority of which were private.

curl -u [[USERNAME]] -s https://api.github.com/orgs/[[ORGANIZATION]]/repos?per_page=200 |
  ruby -rubygems -e 'require "json"; JSON.load(STDIN.read).each { |repo| %x[git clone #{repo["ssh_url"]} ]}'

Replace [[USERNAME]] with your github username and [[ORGANIZATION]] with your Github organization. The output (JSON repo metadata) will be passed to a simple ruby script:

# bring in the Ruby json library
require "json"

# read from STDIN, parse into ruby Hash and iterate over each repo
JSON.load(STDIN.read).each do |repo|
  # run a system command (re: "%x") of the style "git clone <ssh_url>"
  %x[git clone #{repo["ssh_url"]} ]
end

回答8:

I made a script with Python3 and Github APIv3

https://github.com/muhasturk/gitim

Just run

./gitim

回答9:

curl -s https://api.github.com/orgs/[GITHUBORG_NAME]/repos | grep clone_url | awk -F '":' '{ print $2 }' | sed 's/\"//g' | sed 's/,//' | while read line; do git clone "$line"; done

回答10:

I tried a few of the commands and tools above, but decided they were too much of a hassle, so I wrote another command-line tool to do this, called github-dl.

To use it (assuming you have nodejs installed)

npx github-dl -d /tmp/test wires

This would get a list of all the repo's from wires and write info into the test directory, using the authorisation details (user/pass) you provide on the CLI.

In detail, it

Asks for auth (supports 2FA)
Gets list of repos for user/org through Github API
Does pagination for this, so more than 100 repo's supported

It does not actually clone the repos, but instead write a .txt file that you can pass into xargs to do the cloning, for example:

cd /tmp/test
cat wires-repo-urls.txt | xargs -n2 git clone

# or to pull
cat /tmp/test/wires-repo-urls.txt | xargs -n2 git pull

Maybe this is useful for you; it's just a few lines of JS so should be easy to adjust to your needs

回答11:

This python one-liner will do what you need. It:

checks github for your available repos

for each, makes a system call to git clone

python -c "import json, urllib, os; [os.system('git clone ' + r['ssh_url']) for r in json.load(urllib.urlopen('https://api.github.com/orgs/<<ORG_NAME>>/repos?per_page=200'))]"

回答12:

There is also a very useful npm module to do this. It can not only clone, but pull as well (to update data you already have).

You just create config like this:

[{
   "username": "BoyCook",
   "dir": "/Users/boycook/code/boycook",
   "protocol": "ssh"
}]

and do gitall clone for example. Or gitall pull

回答13:

In case anyone looks for a Windows solution, here's a little function in PowerShell to do the trick (could be oneliner/alias if not the fact I need it to work both with and without proxy).

function Unj-GitCloneAllBy($User, $Proxy = $null) {
    (curl -Proxy $Proxy "https://api.github.com/users/$User/repos?page=1&per_page=100").Content 
      | ConvertFrom-Json 
      | %{ $_.clone_url } 
      # workaround git printing to stderr by @wekempf aka William Kempf
      # https://github.com/dahlbyk/posh-git/issues/109#issuecomment-21638678
      | %{ & git clone $_ 2>&1 } 
      | % { $_.ToString() }
}

回答14:

So, in practice, if you want to clone all repos from the organization FOO which match BAR, you could use the one-liner below, which requires jq and common cli utilities

curl 'https://api.github.com/orgs/FOO/repos?access_token=SECRET' |
  jq '.[] |
  .ssh_url' |
  awk '/BAR/ {print "git clone " $0 " & "}' |
  sh

回答15:

Simple solution:

NUM_REPOS=1000
DW_FOLDER="Github_${NUM_REPOS}_repos"
mkdir ${DW_FOLDER}
cd ${DW_FOLDER}
for REPO in $(curl https://api.github.com/users/${GITHUB_USER}/repos?per_page=${NUM_REPOS} | awk '/ssh_url/{print $2}' | sed 's/^"//g' | sed 's/",$//g') ; do git clone ${REPO} ; done

回答16:

You can get a list of the repositories by using curl and then iterate over said list with a bash loop:

GIT_REPOS=`curl -s curl https://${GITHUB_BASE_URL}/api/v3/orgs/${ORG_NAME}/repos?access_token=${ACCESS_TOKEN} | grep ssh_url | awk -F': ' '{print $2}' | sed -e 's/",//g' | sed -e 's/"//g'`
for REPO in $GIT_REPOS; do
  git clone $REPO
done

回答17:

You can use open-source tool to clone bunch of github repositories: https://github.com/artiomn/git_cloner

Example:

git_cloner --type github --owner octocat --login user --password user https://my_bitbucket

Use JSON API from api.github.com. You can see the code example in the github documentation: https://developer.github.com/v3/

Or there:

https://github.com/artiomn/git_cloner/blob/master/src/git_cloner/github.py

回答18:

To clone only private repos, given an access key, and given python 3 and requests module installed:

ORG=company; ACCESS_KEY=0000000000000000000000000000000000000000; for i in $(python -c "import requests; print(' '.join([x['ssh_url'] for x in list(filter(lambda x: x['private'] ,requests.get('https://api.github.com/orgs/$ORG/repos?per_page=1000&access_token=$ACCESS_KEY').json()))]))"); do git clone $i; done;

回答19:

A Python3 solution that includes exhaustive pagination via Link Header.

Pre-requisites:

Github API "Personal Access Token"
pip3 install links-from-link-header
hub

import json
import requests
from requests.auth import HTTPBasicAuth
import links_from_header

respget = lambda url: requests.get(url, auth=HTTPBasicAuth('githubusername', 'githubtoken'))

myorgname = 'abc'
nexturl = f"https://api.github.com/orgs/{myorgname}/repos?per_page=100"

while nexturl:
    print(nexturl)
    resp = respget(nexturl)

    linkheads = resp.headers.get('Link', None)
    if linkheads:
        linkheads_parsed = links_from_header.extract(linkheads)
        nexturl = linkheads_parsed.get('next', None)
    else:
        nexturl = None

    respcon = json.loads(resp.content)
    with open('repolist', 'a') as fh:
        fh.writelines([f'{respconi["full_name"]}\n' for respconi in respcon])

Then, you can use xargs or parallel and: cat repolist | parallel -I% hub clone %

回答20:

If you have list of repositories in a list like this, then this shell script works:

user="https://github.com/user/"

declare -a arr=("repo1", "repo2")

for i in "${arr[@]}"

do

   echo $user"$i"

   git clone $user"$i"

done

回答21:

I created a sample batch script. You can download all private/public repositories from github.com. After a repository is downloaded, it is automatically converted to a zip file.

@echo off
setlocal EnableDelayedExpansion
SET "username=olyanren"
SET "password=G....."
set "mypath=%cd%\"
SET "url=https://%username%:%password%@github.com/%username%/"
FOR /F "tokens=* delims=" %%i in (files.txt) do (
SET repo=%%i
rmdir /s /q !repo!
git clone "!url!!repo!.git"
cd !repo!
echo !mypath!
git archive --format=zip -o "!mypath!!repo!.zip" HEAD
cd ..
)

Note: files.txt file should contain only repository names like:

repository1
repository2

回答22:

Update from May 19

use this bash command for an organization (private repo included)

curl -u "{username}" "https://api.github.com/orgs/{org}/repos?page=1&per_page=100" | grep -o 'git@[^"]*' | xargs -L1 git clone

回答23:

The prevailing answers here don't take into account that the Github API will only return a maximum of 100 repositories despite what you may specify in per_page. If you are cloning a Github org with more than 100 repositories, you will have to follow the paging links in the API response.

I wrote a CLI tool to do just that:

clone-github-org -o myorg

This will clone all repositories in the myorg organization to the current working directory.

回答24:

Another shell script with comments that clones all repositories (public and private) from a user:

#!/bin/bash

USERNAME=INSERT_USERNAME_HERE
PASSWORD=INSERT_PASSWORD_HERE

# Generate auth header
AUTH=$(echo -n $USERNAME:$PASSWORD | base64)

# Get repository URLs
curl -iH "Authorization: Basic "$AUTH https://api.github.com/user/repos | grep -w clone_url > repos.txt

# Clean URLs (remove " and ,) and print only the second column
cat repos.txt | tr -d \"\, | awk '{print $2}'  > repos_clean.txt

# Insert username:password after protocol:// to generate clone URLs
cat repos_clean.txt |  sed "s/:\/\/git/:\/\/$USERNAME\:$PASSWORD\@git/g" > repos_clone.txt

while read FILE; do
    git clone $FILE
done <repos_clone.txt

rm repos.txt & rm repos_clone.txt

来源：https://stackoverflow.com/questions/19576742/how-to-clone-all-repos-at-once-from-github

标签

git

github

git-clone