Can you get the number of lines of code from a GitHub repository?

谁说我不能喝 提交于 2019-11-28 14:55:51
Rory O'Kane

A shell script, cloc-git

You can use this shell script to count the number of lines in a remote Git repository with one command:

#!/usr/bin/env bash git clone --depth 1 "$1" temp-linecount-repo &&   printf "('temp-linecount-repo' will be deleted automatically)\n\n\n" &&   cloc temp-linecount-repo &&   rm -rf temp-linecount-repo 

Installation

This script requires CLOC (“Count Lines of Code”) to be installed. cloc can probably be installed with your package manager – for example, brew install cloc with Homebrew. There is also a docker image published under mribeiro/cloc.

You can install the script by saving its code to a file cloc-git, running chmod +x cloc-git, and then moving the file to a folder in your $PATH such as /usr/local/bin.

Usage

The script takes one argument, which is any URL that git clone will accept. Examples are https://github.com/evalEmpire/perl5i.git (HTTPS) or git@github.com:evalEmpire/perl5i.git (SSH). You can get this URL from any GitHub project page by clicking “Clone or download”.

Example output:

$ cloc-git https://github.com/evalEmpire/perl5i.git Cloning into 'temp-linecount-repo'... remote: Counting objects: 200, done. remote: Compressing objects: 100% (182/182), done. remote: Total 200 (delta 13), reused 158 (delta 9), pack-reused 0 Receiving objects: 100% (200/200), 296.52 KiB | 110.00 KiB/s, done. Resolving deltas: 100% (13/13), done. Checking connectivity... done. ('temp-linecount-repo' will be deleted automatically)        171 text files.      166 unique files.                                                 17 files ignored.  http://cloc.sourceforge.net v 1.62  T=1.13 s (134.1 files/s, 9764.6 lines/s) ------------------------------------------------------------------------------- Language                     files          blank        comment           code ------------------------------------------------------------------------------- Perl                           149           2795           1425           6382 JSON                             1              0              0            270 YAML                             2              0              0            198 ------------------------------------------------------------------------------- SUM:                           152           2795           1425           6850 ------------------------------------------------------------------------------- 

Alternatives

Run the commands manually

If you don’t want to bother saving and installing the shell script, you can run the commands manually. An example:

$ git clone --depth 1 https://github.com/evalEmpire/perl5i.git $ cloc perl5i $ rm -rf perl5i 

Linguist

If you want the results to match GitHub’s language percentages exactly, you can try installing Linguist instead of CLOC. According to its README, you need to gem install linguist and then run linguist. I couldn’t get it to work (issue #2223).

You can simply run something like

git ls-files | xargs wc -l 

which will give you the total count →

Or use this tool → http://line-count.herokuapp.com/

There is an extension for Google Chrome browser - GLOC which works for public and private repos.

Counts the number of lines of code of a project from:

  • project detail page
  • search results page
  • trending page
  • etc.

Lewis

If you go to the graphs/contributors page, you can see a list of all the contributors to the repo and how many lines they've added and removed.

Unless I'm missing something, subtracting the aggregate number of lines deleted from the aggregate number of lines added among all contributors should yield the total number of lines of code in the repo. (EDIT: it turns out I was missing something after all. Take a look at orbitbot's comment for details.)

UPDATE:

This data is also available in GitHub's API. So I wrote a quick script to fetch the data and do the calculation:

'use strict';    //replace jquery/jquery with the repo you're interested in  fetch('https://api.github.com/repos/jquery/jquery/stats/contributors')      .then(response => response.json())      .then(contributors => contributors          .map(contributor => contributor.weeks              .reduce((lineCount, week) => lineCount + week.a - week.d, 0)))      .then(lineCounts => lineCounts.reduce((lineTotal, lineCount) => lineTotal + lineCount))      .then(lines => window.alert(lines));

Just paste it in a Chrome DevTools snippet, change the repo and click run.

Disclaimer (thanks to lovasoa):

Take the results of this method with a grain of salt, because for some repos (sorich87/bootstrap-tour) it results in negative values, which might indicate there's something wrong with the data returned from GitHub's API.

UPDATE:

Looks like this method to calculate total line numbers isn't entirely reliable. Take a look at orbitbot's comment for details.

You can clone just the latest commit using git clone --depth 1 <url> and then perform your own analysis using Linguist, the same software Github uses. That's the only way I know you're going to get lines of code.

Another option is to use the API to list the languages the project uses. It doesn't give them in lines but in bytes. For example...

$ curl https://api.github.com/repos/evalEmpire/perl5i/languages {   "Perl": 274835 } 

Though take that with a grain of salt, that project includes YAML and JSON which the web site acknowledges but the API does not.

Finally, you can use code search to ask which files match a given language. This example asks which files in perl5i are Perl. https://api.github.com/search/code?q=language:perl+repo:evalEmpire/perl5i. It will not give you lines, and you have to ask for the file size separately using the returned url for each file.

Hubro

Not currently possible on Github.com or their API-s

I have talked to customer support and confirmed that this can not be done on github.com. They have passed the suggestion along to the Github team though, so hopefully it will be possible in the future. If so, I'll be sure to edit this answer.

Meanwhile, Rory O'Kane's answer is a brilliant alternative based on cloc and a shallow repo clone.

You can use GitHub API to get the sloc like the following function

function getSloc(repo, tries) {      //repo is the repo's path     if (!repo) {         return Promise.reject(new Error("No repo provided"));     }      //GitHub's API may return an empty object the first time it is accessed     //We can try several times then stop     if (tries === 0) {         return Promise.reject(new Error("Too many tries"));     }      let url = "https://api.github.com/repos" + repo + "/stats/code_frequency";      return fetch(url)         .then(x => x.json())         .then(x => x.reduce((total, changes) => total + changes[1] + changes[2], 0))         .catch(err => getSloc(repo, tries - 1)); } 

Personally I made an chrome extension which shows the number of SLOC on both github project list and project detail page. You can also set your personal access token to access private repositories and bypass the api rate limit.

You can download from here https://chrome.google.com/webstore/detail/github-sloc/fkjjjamhihnjmihibcmdnianbcbccpnn

Source code is available here https://github.com/martianyi/github-sloc

Firefox add-on Github SLOC

I wrote a small firefox addon that prints the number of lines of code on github project pages: Github SLOC

If the question is "can you quickly get NUMBER OF LINES of a github repo", the answer is no as stated by the other answers.

However, if the question is "can you quickly check the SCALE of a project", I usually gauge a project by looking at its size. Of course the size will include deltas from all active commits, but it is a good metric as the order of magnitude is quite close.

E.g.

How big is the "docker" project?

In your browser, enter api.github.com/repos/ORG_NAME/PROJECT_NAME i.e. api.github.com/repos/docker/docker

In the response hash, you can find the size attribute:

{     ...     size: 161432,     ... } 

This should give you an idea of the relative scale of the project. The number seems to be in KB, but when I checked it on my computer it's actually smaller, even though the order of magnitude is consistent. (161432KB = 161MB, du -s -h docker = 65MB)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!