Basically I want to get the number of lines-of-code in the repository after each commit.
The only (really crappy) ways I have found is to use git filter-branch
The first thing that jumps to mind is the possibility of your git history having a nonlinear history. You might have difficulty determining a sensible sequence of commits.
Having said that, it seems like you could keep a log of commit ids and the corresponding lines of code in that commit. In a post-commit hook, starting from the HEAD revision, work backwards (branching to multiple parents if necessary) until all paths reach a commit that you've already seen before. That should give you the total lines of code for each commit id.
Does that help any? I have a feeling that I've misunderstood something about your question.
http://github.com/ITikhonov/git-loc worked right out of the box for me.
You might also consider gitstats, which generates this graph as an html file.
You may get both added and removed lines with git log, like:
git log --shortstat --reverse --pretty=oneline
From this, you can write a similar script to the one you did using this info. In python:
#!/usr/bin/python
"""
Display the per-commit size of the current git branch.
"""
import subprocess
import re
import sys
def main(argv):
git = subprocess.Popen(["git", "log", "--shortstat", "--reverse",
"--pretty=oneline"], stdout=subprocess.PIPE)
out, err = git.communicate()
total_files, total_insertions, total_deletions = 0, 0, 0
for line in out.split('\n'):
if not line: continue
if line[0] != ' ':
# This is a description line
hash, desc = line.split(" ", 1)
else:
# This is a stat line
data = re.findall(
' (\d+) files changed, (\d+) insertions\(\+\), (\d+) deletions\(-\)',
line)
files, insertions, deletions = ( int(x) for x in data[0] )
total_files += files
total_insertions += insertions
total_deletions += deletions
print "%s: %d files, %d lines" % (hash, total_files,
total_insertions - total_deletions)
if __name__ == '__main__':
sys.exit(main(sys.argv))