How to use GitHub V3 API to get commit count for a repo?

前端 未结 7 1935
小鲜肉
小鲜肉 2020-12-05 03:26

I am trying to count commits for many large github repos using the API, so I would like to avoid getting the entire list of commits (this way as an example:

相关标签:
7条回答
  • 2020-12-05 03:31

    If you're looking for the total number of commits in the default branch, you might consider a different approach.

    Use the Repo Contributors API to fetch a list of all contributors:

    https://developer.github.com/v3/repos/#list-contributors

    Each item in the list will contain a contributions field which tells you how many commits the user authored in the default branch. Sum those fields across all contributors and you should get the total number of commits in the default branch.

    The list of contributors if often much shorter than the list of commits, so it should take fewer requests to compute the total number of commits in the default branch.

    0 讨论(0)
  • 2020-12-05 03:33

    Here is a JavaScript example using Fetch based on snowe's approach

    Fetch example

    /**
     * @param {string} owner Owner of repo
     * @param {string} repo Name of repo
     * @returns {number} Number of total commits the repo contains on main master branch
     */
    export const getTotalCommits = (owner, repo) => {
      let url = `https://api.github.com/repos/${owner}/${repo}/commits?per_page=100`;
      let pages = 0;
    
      return fetch(url, {
        headers: {
          Accept: "application/vnd.github.v3+json",
        },
      })
        .then((data) => data.headers)
        .then(
          (result) =>
            result
              .get("link")
              .split(",")[1]
              .match(/.*page=(?<page_num>\d+)/).groups.page_num
        )
        .then((numberOfPages) => {
          pages = numberOfPages;
          return fetch(url + `&page=${numberOfPages}`, {
            headers: {
              Accept: "application/vnd.github.v3+json",
            },
          }).then((data) => data.json());
        })
        .then((data) => {
          return data.length + (pages - 1) * 100;
        })
        .catch((err) => {
          console.log(`ERROR: calling: ${url}`);
          console.log("See below for more info:");
          console.log(err);
        });
    };
    

    Usage

    getTotalCommits('facebook', 'react').then(commits => {
        console.log(commits);
    });
    
    0 讨论(0)
  • 2020-12-05 03:37

    I just made a little script to do this. It may not work with large repositories since it does not handle GitHub's rate limits. Also it requires the Python requests package.

    #!/bin/env python3.4
    import requests
    
    GITHUB_API_BRANCHES = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/branches'
    GUTHUB_API_COMMITS = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/commits?sha=%(sha)s&page=%(page)i'
    
    
    def github_commit_counter(namespace, repository, access_token=''):
        commit_store = list()
    
        branches = requests.get(GITHUB_API_BRANCHES % {
            'token': access_token,
            'namespace': namespace,
            'repository': repository,
        }).json()
    
        print('Branch'.ljust(47), 'Commits')
        print('-' * 55)
    
        for branch in branches:
            page = 1
            branch_commits = 0
    
            while True:
                commits = requests.get(GUTHUB_API_COMMITS % {
                    'token': access_token,
                    'namespace': namespace,
                    'repository': repository,
                    'sha': branch['name'],
                    'page': page
                }).json()
    
                page_commits = len(commits)
    
                for commit in commits:
                    commit_store.append(commit['sha'])
    
                branch_commits += page_commits
    
                if page_commits == 0:
                    break
    
                page += 1
    
            print(branch['name'].ljust(45), str(branch_commits).rjust(9))
    
        commit_store = set(commit_store)
        print('-' * 55)
        print('Total'.ljust(42), str(len(commit_store)).rjust(12))
    
    # for private repositories, get your own token from
    # https://github.com/settings/tokens
    # github_commit_counter('github', 'gitignore', access_token='fnkr:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
    github_commit_counter('github', 'gitignore')
    
    0 讨论(0)
  • 2020-12-05 03:41

    You can consider using GraphQL API v4 to perform commit count for multiple repositories at the same times using aliases. The following will fetch commit count for all branches of 3 distinct repositories (up to 100 branches per repo) :

    {
      gson: repository(owner: "google", name: "gson") {
        ...RepoFragment
      }
      martian: repository(owner: "google", name: "martian") {
        ...RepoFragment
      }
      keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
        ...RepoFragment
      }
    }
    
    fragment RepoFragment on Repository {
      name
      refs(first: 100, refPrefix: "refs/heads/") {
        edges {
          node {
            name
            target {
              ... on Commit {
                id
                history(first: 0) {
                  totalCount
                }
              }
            }
          }
        }
      }
    }
    

    Try it in the explorer

    RepoFragment is a fragment which helps to avoid the duplicate query fields for each of those repo

    If you only need commit count on the default branch, it's more straightforward :

    {
      gson: repository(owner: "google", name: "gson") {
        ...RepoFragment
      }
      martian: repository(owner: "google", name: "martian") {
        ...RepoFragment
      }
      keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
        ...RepoFragment
      }
    }
    
    fragment RepoFragment on Repository {
      name
      defaultBranchRef {
        name
        target {
          ... on Commit {
            id
            history(first: 0) {
              totalCount
            }
          }
        }
      }
    }
    

    Try it in the explorer

    0 讨论(0)
  • 2020-12-05 03:46

    Using the GraphQL API v4 is probably the way to handle this if you're starting out in a new project, but if you're still using the REST API v3 you can get around the pagination issue by limiting the request to just 1 result per page. By setting that limit, the number of pages returned in the last link will be equal to the total.

    For example using python3 and the requests library

    def commit_count(project, sha='master', token=None):
        """
        Return the number of commits to a project
        """
        token = token or os.environ.get('GITHUB_API_TOKEN')
        url = f'https://api.github.com/repos/{project}/commits'
        headers = {
            'Accept': 'application/json',
            'Content-Type': 'application/json',
            'Authorization': f'token {token}',
        }
        params = {
            'sha': sha,
            'per_page': 1,
        }
        resp = requests.request('GET', url, params=params, headers=headers)
        if (resp.status_code // 100) != 2:
            raise Exception(f'invalid github response: {resp.content}')
        # check the resp count, just in case there are 0 commits
        commit_count = len(resp.json())
        last_page = resp.links.get('last')
        # if there are no more pages, the count must be 0 or 1
        if last_page:
            # extract the query string from the last page url
            qs = urllib.parse.urlparse(last_page['url']).query
            # extract the page number from the query string
            commit_count = int(dict(urllib.parse.parse_qsl(qs))['page'])
        return commit_count
    
    0 讨论(0)
  • 2020-12-05 03:52

    I used python to create a generator which returns a list of contributors, sums up the total commit count, and then checks if it is valid. Returns True if it has less, and False if the same or greater commits. The only thing you have to fill in is the requests session that uses your credentials. Here's what I wrote for you:

    from requests import session
    def login()
        sess = session()
    
        # login here and return session with valid creds
        return sess
    
    def generateList(link):
        # you need to login before you do anything
        sess = login()
    
        # because of the way that requests works, you must start out by creating an object to
        # imitate the response object. This will help you to cleanly while-loop through
        # github's pagination
        class response_immitator:
            links = {'next': {'url':link}}
        response = response_immitator() 
        while 'next' in response.links:
            response = sess.get(response.links['next']['url'])
            for repo in response.json():
                yield repo
    
    def check_commit_count(baseurl, user_name, repo_name, max_commit_count=None):
        # login first
        sess = login()
        if max_commit_count != None:
            totalcommits = 0
    
            # construct url to paginate
            url = baseurl+"repos/" + user_name + '/' + repo_name + "/stats/contributors"
            for stats in generateList(url):
                totalcommits+=stats['total']
    
            if totalcommits >= max_commit_count:
                return False
            else:
                return True
    
    def main():
        # what user do you want to check for commits
        user_name = "arcsector"
    
        # what repo do you want to check for commits
        repo_name = "EyeWitness"
    
        # github's base api url
        baseurl = "https://api.github.com/"
    
        # call function
        check_commit_count(baseurl, user_name, repo_name, 30)
    
    if __name__ == "__main__":
        main()
    
    0 讨论(0)
提交回复
热议问题