GitHub API: Repositories Contributed To

后端 未结 10 1731
耶瑟儿~
耶瑟儿~ 2020-12-07 09:45

Is there a way to get access to the data in the “Repositories contributed to” module on GitHub profile pages via the GitHub API? Ideally the entire list, not just the top fi

相关标签:
10条回答
  • 2020-12-07 10:17

    With GraphQL API v4, you can now get these contributed repo using :

    {
      viewer {
        repositoriesContributedTo(first: 100, contributionTypes: [COMMIT, ISSUE, PULL_REQUEST, REPOSITORY]) {
          totalCount
          nodes {
            nameWithOwner
          }
          pageInfo {
            endCursor
            hasNextPage
          }
        }
      }
    }
    

    Try it in the explorer

    Source

    If you have more than 100 contributed repo (including yours), you will have to go through pagination specifying after: "END_CURSOR_VALUE" in repositoriesContributedTo for the next request.

    0 讨论(0)
  • 2020-12-07 10:18

    You can use Search provided by GitHub API. Your query should look something like this:

    https://api.github.com/search/repositories?q=%20+fork:true+user:username

    fork parameter set to true ensures that you query all user's repos, forked included.

    However, if you want to make sure the user not only forked repository, but contributed to it, you should iterate through every repo you got with 'search' request and check if user is within them. Which quite sucks, because github returns only 100 contributors and there is no solution for that...

    0 讨论(0)
  • 2020-12-07 10:18

    I wrote a selenium python script to do this

    """
    Get all your repos contributed to for the past year.
    
    This uses Selenium and Chrome to login to github as your user, go through 
    your contributions page, and grab the repo from each day's contribution page.
    
    Requires python3, selenium, and Chrome with chromedriver installed.
    
    Change the username variable, and run like this:
    
    GITHUB_PASS="mypassword" python3 github_contributions.py
    """
    
    import os
    import sys
    import time
    from pprint import pprint as pp
    from urllib.parse import urlsplit
    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    username = 'jessejoe'
    password = os.environ['GITHUB_PASS']
    
    repos = []
    driver = webdriver.Chrome()
    driver.get('https://github.com/login')
    
    driver.find_element_by_id('login_field').send_keys(username)
    password_elem = driver.find_element_by_id('password')
    password_elem.send_keys(password)
    password_elem.submit()
    
    # Wait indefinitely for 2-factor code
    if 'two-factor' in driver.current_url:
        print('2-factor code required, go enter it')
    while 'two-factor' in driver.current_url:
        time.sleep(1)
    
    driver.get('https://github.com/{}'.format(username))
    
    # Get all days that aren't colored gray (no contributions)
    contrib_days = driver.find_elements_by_xpath(
        "//*[@class='day' and @fill!='#eeeeee']")
    
    for day in contrib_days:
        day.click()
        # Wait until done loading
        WebDriverWait(driver, 10).until(
            lambda driver: 'loading' not in driver.find_element_by_css_selector('.contribution-activity').get_attribute('class'))
    
        # Get all contribution URLs
        contribs = driver.find_elements_by_css_selector('.contribution-activity a')
        for contrib in contribs:
            url = contrib.get_attribute('href')
            # Only care about repo owner and name from URL
            repo_path = urlsplit(url).path
            repo = '/'.join(repo_path.split('/')[0:3])
            if repo not in repos:
                repos.append(repo)
        # Have to click something else to remove pop-up on current day
        driver.find_element_by_css_selector('.vcard-fullname').click()
    
    driver.quit()
    pp(repos)
    

    It uses python and selenium to automate a Chrome browser to login to github, go to your contributions page, click each day and grab the repo name from any contributions. Since this page only shows 1 year's worth of activity, that's all you can get with this script.

    0 讨论(0)
  • 2020-12-07 10:21

    I didn't see any way of doing it in the API. The closest I could find was to get the latest 300 events from a public user (300 is the limit, unfortunately), and then you can sort those for contributions to other's repositories.

    https://developer.github.com/v3/activity/events/#list-public-events-performed-by-a-user

    We need to ask Github to implement this in their API.

    0 讨论(0)
  • 2020-12-07 10:24

    You'll probably get the last year or so via GitHub's GraphQL API, as shown in Bertrand Martel's answer.

    Everything that happened back to 2011 can be found in GitHub Archive, as stated in Kyle Kelley's answer. However, BigQuery's syntax and GitHub's API seems to have changed and the examples shown there no longer work in 08/2020.

    So here's how I found all repos I contributed to

    SELECT distinct repo.name
    FROM (
      SELECT * FROM `githubarchive.year.2011` UNION ALL
      SELECT * FROM `githubarchive.year.2012` UNION ALL
      SELECT * FROM `githubarchive.year.2013` UNION ALL
      SELECT * FROM `githubarchive.year.2014` UNION ALL
      SELECT * FROM `githubarchive.year.2015` UNION ALL
      SELECT * FROM `githubarchive.year.2016` UNION ALL
      SELECT * FROM `githubarchive.year.2017` UNION ALL
      SELECT * FROM `githubarchive.year.2018`
    )
    WHERE (type = 'PushEvent' 
      OR type = 'PullRequestEvent')
      AND actor.login = 'YOUR_USER'
    

    Some of there Repos returned only have a name, no user or org. But I had to process the result manually afterwards anyway.

    0 讨论(0)
  • 2020-12-07 10:26

    I came to the problem. (GithubAPI: Get repositories a user has ever committed in)

    One actual hack I've found is that there's a project called http://www.githubarchive.org/ They log all public events starting from 2011. Not ideal, but can be helpful.

    So, for example, in your case:

    SELECT  payload_pull_request_head_repo_clone_url 
    FROM [githubarchive:github.timeline]
    WHERE payload_pull_request_base_user_login='outoftime'
    GROUP BY payload_pull_request_head_repo_clone_url;
    

    Gives, if I'm not mistaken, the list of repos you've pull requested to:

    https://github.com/jreidthompson/noaa.git
    https://github.com/kkrol89/sunspot.git
    https://github.com/rterbush/sunspot.git
    https://github.com/ottbot/cassandra-cql.git
    https://github.com/insoul/cequel.git
    https://github.com/mcordell/noaa.git
    https://github.com/hackhands/sunspot_rails.git
    https://github.com/lgierth/eager_record.git
    https://github.com/jnicklas/sunspot.git
    https://github.com/klclee/sunspot.git
    https://github.com/outoftime/cequel.git
    

    You can play with bigquery here: bigquery.cloud.google.com, data schema can be found here: https://github.com/igrigorik/githubarchive.org/blob/master/bigquery/schema.js

    0 讨论(0)
提交回复
热议问题