I am trying to count commits for many large github repos using the API, so I would like to avoid getting the entire list of commits (this way as an example:
If you're looking for the total number of commits in the default branch, you might consider a different approach.
Use the Repo Contributors API to fetch a list of all contributors:
https://developer.github.com/v3/repos/#list-contributors
Each item in the list will contain a contributions
field which tells you how many commits the user authored in the default branch. Sum those fields across all contributors and you should get the total number of commits in the default branch.
The list of contributors if often much shorter than the list of commits, so it should take fewer requests to compute the total number of commits in the default branch.
Here is a JavaScript example using Fetch based on snowe's approach
/**
* @param {string} owner Owner of repo
* @param {string} repo Name of repo
* @returns {number} Number of total commits the repo contains on main master branch
*/
export const getTotalCommits = (owner, repo) => {
let url = `https://api.github.com/repos/${owner}/${repo}/commits?per_page=100`;
let pages = 0;
return fetch(url, {
headers: {
Accept: "application/vnd.github.v3+json",
},
})
.then((data) => data.headers)
.then(
(result) =>
result
.get("link")
.split(",")[1]
.match(/.*page=(?<page_num>\d+)/).groups.page_num
)
.then((numberOfPages) => {
pages = numberOfPages;
return fetch(url + `&page=${numberOfPages}`, {
headers: {
Accept: "application/vnd.github.v3+json",
},
}).then((data) => data.json());
})
.then((data) => {
return data.length + (pages - 1) * 100;
})
.catch((err) => {
console.log(`ERROR: calling: ${url}`);
console.log("See below for more info:");
console.log(err);
});
};
getTotalCommits('facebook', 'react').then(commits => {
console.log(commits);
});
I just made a little script to do this. It may not work with large repositories since it does not handle GitHub's rate limits. Also it requires the Python requests package.
#!/bin/env python3.4
import requests
GITHUB_API_BRANCHES = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/branches'
GUTHUB_API_COMMITS = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/commits?sha=%(sha)s&page=%(page)i'
def github_commit_counter(namespace, repository, access_token=''):
commit_store = list()
branches = requests.get(GITHUB_API_BRANCHES % {
'token': access_token,
'namespace': namespace,
'repository': repository,
}).json()
print('Branch'.ljust(47), 'Commits')
print('-' * 55)
for branch in branches:
page = 1
branch_commits = 0
while True:
commits = requests.get(GUTHUB_API_COMMITS % {
'token': access_token,
'namespace': namespace,
'repository': repository,
'sha': branch['name'],
'page': page
}).json()
page_commits = len(commits)
for commit in commits:
commit_store.append(commit['sha'])
branch_commits += page_commits
if page_commits == 0:
break
page += 1
print(branch['name'].ljust(45), str(branch_commits).rjust(9))
commit_store = set(commit_store)
print('-' * 55)
print('Total'.ljust(42), str(len(commit_store)).rjust(12))
# for private repositories, get your own token from
# https://github.com/settings/tokens
# github_commit_counter('github', 'gitignore', access_token='fnkr:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
github_commit_counter('github', 'gitignore')
You can consider using GraphQL API v4 to perform commit count for multiple repositories at the same times using aliases. The following will fetch commit count for all branches of 3 distinct repositories (up to 100 branches per repo) :
{
gson: repository(owner: "google", name: "gson") {
...RepoFragment
}
martian: repository(owner: "google", name: "martian") {
...RepoFragment
}
keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
...RepoFragment
}
}
fragment RepoFragment on Repository {
name
refs(first: 100, refPrefix: "refs/heads/") {
edges {
node {
name
target {
... on Commit {
id
history(first: 0) {
totalCount
}
}
}
}
}
}
}
Try it in the explorer
RepoFragment
is a fragment which helps to avoid the duplicate query fields for each of those repo
If you only need commit count on the default branch, it's more straightforward :
{
gson: repository(owner: "google", name: "gson") {
...RepoFragment
}
martian: repository(owner: "google", name: "martian") {
...RepoFragment
}
keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
...RepoFragment
}
}
fragment RepoFragment on Repository {
name
defaultBranchRef {
name
target {
... on Commit {
id
history(first: 0) {
totalCount
}
}
}
}
}
Try it in the explorer
Using the GraphQL API v4 is probably the way to handle this if you're starting out in a new project, but if you're still using the REST API v3 you can get around the pagination issue by limiting the request to just 1 result per page. By setting that limit, the number of pages
returned in the last link will be equal to the total.
For example using python3 and the requests library
def commit_count(project, sha='master', token=None):
"""
Return the number of commits to a project
"""
token = token or os.environ.get('GITHUB_API_TOKEN')
url = f'https://api.github.com/repos/{project}/commits'
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': f'token {token}',
}
params = {
'sha': sha,
'per_page': 1,
}
resp = requests.request('GET', url, params=params, headers=headers)
if (resp.status_code // 100) != 2:
raise Exception(f'invalid github response: {resp.content}')
# check the resp count, just in case there are 0 commits
commit_count = len(resp.json())
last_page = resp.links.get('last')
# if there are no more pages, the count must be 0 or 1
if last_page:
# extract the query string from the last page url
qs = urllib.parse.urlparse(last_page['url']).query
# extract the page number from the query string
commit_count = int(dict(urllib.parse.parse_qsl(qs))['page'])
return commit_count
I used python to create a generator which returns a list of contributors, sums up the total commit count, and then checks if it is valid. Returns True
if it has less, and False
if the same or greater commits. The only thing you have to fill in is the requests session that uses your credentials. Here's what I wrote for you:
from requests import session
def login()
sess = session()
# login here and return session with valid creds
return sess
def generateList(link):
# you need to login before you do anything
sess = login()
# because of the way that requests works, you must start out by creating an object to
# imitate the response object. This will help you to cleanly while-loop through
# github's pagination
class response_immitator:
links = {'next': {'url':link}}
response = response_immitator()
while 'next' in response.links:
response = sess.get(response.links['next']['url'])
for repo in response.json():
yield repo
def check_commit_count(baseurl, user_name, repo_name, max_commit_count=None):
# login first
sess = login()
if max_commit_count != None:
totalcommits = 0
# construct url to paginate
url = baseurl+"repos/" + user_name + '/' + repo_name + "/stats/contributors"
for stats in generateList(url):
totalcommits+=stats['total']
if totalcommits >= max_commit_count:
return False
else:
return True
def main():
# what user do you want to check for commits
user_name = "arcsector"
# what repo do you want to check for commits
repo_name = "EyeWitness"
# github's base api url
baseurl = "https://api.github.com/"
# call function
check_commit_count(baseurl, user_name, repo_name, 30)
if __name__ == "__main__":
main()