How to use GitHub V3 API to get commit count for a repo?

一曲冷凌霜 提交于 2019-12-17 18:03:07

问题


I am trying to count commits for many large github repos using the API, so I would like to avoid getting the entire list of commits (this way as an example: api.github.com/repos/jasonrudolph/keyboard/commits ) and counting them.

If I had the hash of the first (initial) commit , I could use this technique to compare the first commit to the latest and it happily reports the total_commits in between (so I'd need to add one) that way. Unfortunately, I cannot see how to elegantly get the first commit using the API.

The base repo URL does give me the created_at (this url is an example: api.github.com/repos/jasonrudolph/keyboard ), so I could get a reduced commit set by limiting the commits to be until the create date (this url is an example: api.github.com/repos/jasonrudolph/keyboard/commits?until=2013-03-30T16:01:43Z) and using the earliest one (always listed last?) or maybe the one with an empty parent (not sure about if forked projects have initial parent commits).

Any better way to get the first commit hash for a repo?

Better yet, this whole thing seems convoluted for a simple statistic, and I wonder if I'm missing something. Any better ideas for using the API to get the repo commit count?

Edit: This somewhat similar question is trying to filter by certain files (" and within them to specific files."), so has a different answer.


回答1:


You can consider using GraphQL API v4 to perform commit count for multiple repositories at the same times using aliases. The following will fetch commit count for all branches of 3 distinct repositories (up to 100 branches per repo) :

{
  gson: repository(owner: "google", name: "gson") {
    ...RepoFragment
  }
  martian: repository(owner: "google", name: "martian") {
    ...RepoFragment
  }
  keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
    ...RepoFragment
  }
}

fragment RepoFragment on Repository {
  name
  refs(first: 100, refPrefix: "refs/heads/") {
    edges {
      node {
        name
        target {
          ... on Commit {
            id
            history(first: 0) {
              totalCount
            }
          }
        }
      }
    }
  }
}

Try it in the explorer

RepoFragment is a fragment which helps to avoid the duplicate query fields for each of those repo

If you only need commit count on the default branch, it's more straightforward :

{
  gson: repository(owner: "google", name: "gson") {
    ...RepoFragment
  }
  martian: repository(owner: "google", name: "martian") {
    ...RepoFragment
  }
  keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
    ...RepoFragment
  }
}

fragment RepoFragment on Repository {
  name
  defaultBranchRef {
    name
    target {
      ... on Commit {
        id
        history(first: 0) {
          totalCount
        }
      }
    }
  }
}

Try it in the explorer




回答2:


If you're looking for the total number of commits in the default branch, you might consider a different approach.

Use the Repo Contributors API to fetch a list of all contributors:

https://developer.github.com/v3/repos/#list-contributors

Each item in the list will contain a contributions field which tells you how many commits the user authored in the default branch. Sum those fields across all contributors and you should get the total number of commits in the default branch.

The list of contributors if often much shorter than the list of commits, so it should take fewer requests to compute the total number of commits in the default branch.




回答3:


I just made a little script to do this. It may not work with large repositories since it does not handle GitHub's rate limits. Also it requires the Python requests package.

#!/bin/env python3.4
import requests

GITHUB_API_BRANCHES = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/branches'
GUTHUB_API_COMMITS = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/commits?sha=%(sha)s&page=%(page)i'


def github_commit_counter(namespace, repository, access_token=''):
    commit_store = list()

    branches = requests.get(GITHUB_API_BRANCHES % {
        'token': access_token,
        'namespace': namespace,
        'repository': repository,
    }).json()

    print('Branch'.ljust(47), 'Commits')
    print('-' * 55)

    for branch in branches:
        page = 1
        branch_commits = 0

        while True:
            commits = requests.get(GUTHUB_API_COMMITS % {
                'token': access_token,
                'namespace': namespace,
                'repository': repository,
                'sha': branch['name'],
                'page': page
            }).json()

            page_commits = len(commits)

            for commit in commits:
                commit_store.append(commit['sha'])

            branch_commits += page_commits

            if page_commits == 0:
                break

            page += 1

        print(branch['name'].ljust(45), str(branch_commits).rjust(9))

    commit_store = set(commit_store)
    print('-' * 55)
    print('Total'.ljust(42), str(len(commit_store)).rjust(12))

# for private repositories, get your own token from
# https://github.com/settings/tokens
# github_commit_counter('github', 'gitignore', access_token='fnkr:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
github_commit_counter('github', 'gitignore')



回答4:


Simple solution: Look at the page number. Github paginates for you. so you can easily calculate the number of commits by just getting the last page number from the Link header, subtracting one (you'll need to add up the last page manually), multiplying by the page size, grabbing the last page of results and getting the size of that array and adding the two numbers together. It's a max of two API calls!

Here is my implementation of grabbing the total number of commits for an entire organization using the octokit gem in ruby:

@github = Octokit::Client.new access_token: key, auto_traversal: true, per_page: 100

Octokit.auto_paginate = true
repos = @github.org_repos('my_company', per_page: 100)

# * take the pagination number
# * get the last page
# * see how many items are on it
# * multiply the number of pages - 1 by the page size
# * and add the two together. Boom. Commit count in 2 api calls
def calc_total_commits(repos)
    total_sum_commits = 0

    repos.each do |e| 
        repo = Octokit::Repository.from_url(e.url)
        number_of_commits_in_first_page = @github.commits(repo).size
        repo_sum = 0
        if number_of_commits_in_first_page >= 100
            links = @github.last_response.rels

            unless links.empty?
                last_page_url = links[:last].href

                /.*page=(?<page_num>\d+)/ =~ last_page_url
                repo_sum += (page_num.to_i - 1) * 100 # we add the last page manually
                repo_sum += links[:last].get.data.size
            end
        else
            repo_sum += number_of_commits_in_first_page
        end
        puts "Commits for #{e.name} : #{repo_sum}"
        total_sum_commits += repo_sum
    end
    puts "TOTAL COMMITS #{total_sum_commits}"
end

and yes I know the code is dirty, this was just thrown together in a few minutes.




回答5:


Using the GraphQL API v4 is probably the way to handle this if you're starting out in a new project, but if you're still using the REST API v3 you can get around the pagination issue by limiting the request to just 1 result per page. By setting that limit, the number of pages returned in the last link will be equal to the total.

For example using python3 and the requests library

def commit_count(project, sha='master', token=None):
    """
    Return the number of commits to a project
    """
    token = token or os.environ.get('GITHUB_API_TOKEN')
    url = f'https://api.github.com/repos/{project}/commits'
    headers = {
        'Accept': 'application/json',
        'Content-Type': 'application/json',
        'Authorization': f'token {token}',
    }
    params = {
        'sha': sha,
        'per_page': 1,
    }
    resp = requests.request('GET', url, params=params, headers=headers)
    if (resp.status_code // 100) != 2:
        raise Exception(f'invalid github response: {resp.content}')
    # check the resp count, just in case there are 0 commits
    commit_count = len(resp.json())
    last_page = resp.links.get('last')
    # if there are no more pages, the count must be 0 or 1
    if last_page:
        # extract the query string from the last page url
        qs = urllib.parse.urlparse(last_page['url']).query
        # extract the page number from the query string
        commit_count = int(dict(urllib.parse.parse_qsl(qs))['page'])
    return commit_count



回答6:


I used python to create a generator which returns a list of contributors, sums up the total commit count, and then checks if it is valid. Returns True if it has less, and False if the same or greater commits. The only thing you have to fill in is the requests session that uses your credentials. Here's what I wrote for you:

from requests import session
def login()
    sess = session()

    # login here and return session with valid creds
    return sess

def generateList(link):
    # you need to login before you do anything
    sess = login()

    # because of the way that requests works, you must start out by creating an object to
    # imitate the response object. This will help you to cleanly while-loop through
    # github's pagination
    class response_immitator:
        links = {'next': {'url':link}}
    response = response_immitator() 
    while 'next' in response.links:
        response = sess.get(response.links['next']['url'])
        for repo in response.json():
            yield repo

def check_commit_count(baseurl, user_name, repo_name, max_commit_count=None):
    # login first
    sess = login()
    if max_commit_count != None:
        totalcommits = 0

        # construct url to paginate
        url = baseurl+"repos/" + user_name + '/' + repo_name + "/stats/contributors"
        for stats in generateList(url):
            totalcommits+=stats['total']

        if totalcommits >= max_commit_count:
            return False
        else:
            return True

def main():
    # what user do you want to check for commits
    user_name = "arcsector"

    # what repo do you want to check for commits
    repo_name = "EyeWitness"

    # github's base api url
    baseurl = "https://api.github.com/"

    # call function
    check_commit_count(baseurl, user_name, repo_name, 30)

if __name__ == "__main__":
    main()


来源:https://stackoverflow.com/questions/27931139/how-to-use-github-v3-api-to-get-commit-count-for-a-repo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!