Quantifying the amount of change in a git diff?

前端 未结 9 2056
不思量自难忘°
不思量自难忘° 2020-12-13 17:38

I use git for a slightly unusual purpose--it stores my text as I write fiction. (I know, I know...geeky.)

I am trying to keep track of productivity, and want to meas

9条回答
  •  难免孤独
    2020-12-13 18:30

    The above answers fail for some use cases where you need to exclude moved text (e.g., if I move a function in code or paragraph in latex further down the document, I don't want to count all of those as changes!)

    For that, you can also calculate the number of duplicate lines, and exclude those from your query if there are too many duplicates.

    For example, building on the other answers, I can do:

    git diff $sha~1..$sha|grep -e"^+[^+]" -e"^-[^-]"|sed -e's/.//'|sort|uniq -d|wc -w|xargs
    

    calculates the number of duplicate words in the diff, where sha is your commit.

    You can do this for all the commits within the last day (since 6 am) by:

    for sha in $(git rev-list --since="6am" master | sed -e '$ d'); do
         echo $(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^+[^+]"|wc -w|xargs),\
         $(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^-[^-]"|wc -w|xargs),\
         $(git diff $sha~1..$sha|grep -e"^+[^+]" -e"^-[^-]"|sed -e's/.//'|sort|uniq -d|wc -w|xargs)
    done
    

    Prints: added, deleted, duplicates

    (I take the line diff for duplicates, as it excludes the times where git diff tries to be too clever, and assumes you have actually just changed text rather than moved it. It also discounts instances where a single word is counted as a duplicate.)

    Or, if you want to be sophisticated about it, you can exclude commits entirely if there is more than 80% duplication, and sum up the rest:

    total=0
    for sha in $(git rev-list --since="6am" master | sed -e '$ d'); do
        added=$(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^+[^+]"|wc -w|xargs)
        deleted=$(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^-[^-]"|wc -w|xargs)
        duplicated=$(git diff $sha~1..$sha|grep -e"^+[^+]" -e"^-[^-]"|sed -e's/.//'|sort|uniq -d|wc -w|xargs)
        if [ "$added" -eq "0" ]; then
            changed=$deleted
            total=$((total+deleted))
            echo "added:" $added, "deleted:" $deleted, "duplicated:"\
                 $duplicated, "changed:" $changed
        elif [ "$(echo "$duplicated/$added > 0.8" | bc -l)" -eq "1" ]; then
            echo "added:" $added, "deleted:" $deleted, "duplicated:"\
                 $duplicated, "changes counted:" 0
        else
            changed=$((added+deleted))
            total=$((total+changed))
            echo "added:" $added, "deleted:" $deleted, "duplicated:"\
                 $duplicated, "changes counted:" $changed
        fi
    done
    echo "Total changed:" $total
    

    I have this script to do it here: https://github.com/MilesCranmer/git-stats.

    This prints out:

    ➜  bifrost_paper git:(master) ✗ count_changed_words "6am" 
    
    added: 38, deleted: 76, duplicated: 3, changes counted: 114
    added: 14, deleted: 19, duplicated: 0, changes counted: 33
    added: 1113, deleted: 1112, duplicated: 1106, changes counted: 0
    added: 1265, deleted: 1275, duplicated: 1225, changes counted: 0
    added: 4207, deleted: 4208, duplicated: 4391, changes counted: 0
    Total changed: 147
    

    The commits where I am just moving around things are obvious, so I don't count those changes. It counts up everything else and tells me the total number of changed words.

提交回复
热议问题