Evaluation time for `git describe` in smuge filter

问题

After a successful conversion of an old SVN repository into Git, with svn2git, I have been tasked with reproducing the $Revision$ keyword expansion, (or a close approximation of it).

So I ...

added a svn-r annotated tag for SVN's rev0
in .git/attributes added
```
* filter=revsion
```

in .git/configure added

[filter "revsion"]
    smudge = /bin/sed -e 's/\\$Revision\\$/$Revision: '$(GIT_EXEC_PATH=/usr/lib/git-core/ /usr/bin/git describe --match svn-r)'$/g'
    clean = /bin/sed -e 's/\\$Revision: [^$]*\\$/$Revision$/g'

... and it works, but is doing the wrong thing.

Whenever I do a checkout, it expand the $Revision$ the git describe of the previous HEAD (before the checkout). So that when I am on master~1 and doing git checkout master. I get the expansion for master~1 and not for master.

Just to make sure that the early evaluation was not the fault of the $(...) in .git/config I also tried to move this code into its own script, but to no avail.

Hence my question: Is there a way to make git describe that runs by a smudge filter to describe the commit after the checkout?

回答1:

TL;DR: a (tested) solution

Try this post-checkout hook (now tested, albeit lightly; I put it in my scripts repository on GitHub as well):

#! /usr/bin/env python

"""
post-checkout hook to re-smudge files
"""

from __future__ import print_function

import collections
import os
import subprocess
import sys

def run(cmd):
    """
    Run command and collect its stdout.  If it produces any stderr
    or exits nonzero, die a la subprocess.check_call().
    """
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    stdout, _ = proc.communicate()
    status = proc.wait()
    if status != 0:
        raise subprocess.CalledProcessError(status, cmd)
    return stdout

def git_ls_files(*args):
    """
    Run git ls-files with given arguments, plus -z; break up
    returned byte string into list of files.  Note, in Py3k this
    will be a list of byte-strings!
    """
    output = run(['git', 'ls-files', '-z'] + list(args))
    # -z produces NUL termination, not NUL separation: discard last entry
    return output.split(b'\0')[:-1]

def recheckout(files):
    """
    Force Git to re-extract the given files from the index.
    Since Git insists on doing nothing when the files exist
    in the work-tree, we first *remove* them.

    To avoid blowing up on very long argument lists, do these
    1000 files at a time or up to 10k bytes of argument at a
    time, whichever occurs first.  Note that we may go over
    the 10k limit by the length of whatever file is long, so
    it's a sloppy limit and we don't need to be very accurate.
    """
    files = collections.deque(files)
    while files:
        todo = [b'git', b'checkout', b'--']
        # should add 1 to account for b'\0' between arguments in os exec:
        # argbytes = reduce(lambda x, y: x + len(y) + 1, todo, 0)
        # but let's just be very sloppy here
        argbytes = 0
        while files and len(todo) < 1000 and argbytes < 10000:
            path = files.popleft()
            todo.append(path)
            argbytes += len(path) + 1
            os.remove(path)
        # files is now empty, or todo has reached its limit:
        # run the git checkout command
        run(todo)

def warn_about(files):
    """
    Make a note to the user that some file(s) have not been
    re-checked-out as they are modified in the work-tree.
    """
    if len(files) == 0:
        return
    print("Note: the following files have been carried over and may")
    print("not match what you would expect for a clean checkout:")
    # If this is py3k, each path is a bytes and we need a string.
    if type(b'') == type(''):
        printable = lambda path: path
    else:
        printable = lambda path: path.decode('unicode_escape')
    for path in files:
        print('\t{}\n'.format(printable(path)))

def main():
    """
    Run, as called by git post-checkout hook.  We get three arguments
    that are very simple, so no need for argparse.

    We only want to do something when:
     - the flag argument, arg 3, is 1
     - the two other arguments differ

    What we do is re-checkout the *unmodified* files, to
    force them to re-run through any defined .gitattributes
    filter.
    """
    argv = sys.argv[1:]
    if len(argv) != 3:
        return 'error: hook must be called with three arguments'
    if argv[2] != '1':
        return 0
    if argv[0] == argv[1]:
        return 0
    allfiles = git_ls_files()
    modfiles = git_ls_files('-m')
    unmodified = set(allfiles) - set(modfiles)
    recheckout(unmodified)
    warn_about(modfiles)
    return 0

if __name__ == '__main__':
    try:
        sys.exit(main())
    except KeyboardInterrupt:
        sys.exit('\nInterrupted')

To improve performance, you can modify it to operate only on files that are likely to use $Revision$ (your attribute defines this as "all files" so I used that here).

Long

I thought about this problem a bit this morning. As you have observed, it is simply that git checkout has not yet updated the HEAD reference at the time it is populating the index and work-tree while changing commits. Ultimately, it seems too annoying to attempt to compute what git checkout is about to set HEAD to. You might instead use a post-checkout hook.

It's not clear yet whether this should be something to use instead of the smudge filter, or in addition to the smudge filter, but I think in addition to is correct. You almost certainly still want the clean filter to operate as usual.

In any case, a post-checkout hook gets:

... three parameters: the ref of the previous HEAD, the ref of the new HEAD (which may or may not have changed), and a flag indicating whether the checkout was a branch checkout (changing branches, flag=1) or a file checkout (retrieving a file from the index, flag=0). This hook cannot affect the outcome of git checkout.

(There is bug in git checkout and/or the documentation here. The last sentence says "cannot affect the outcome", but that's not true in two ways:

The exit status of the hook becomes the exit status of git checkout. This makes the checkout appear to have failed if the exit status of the hook is nonzero.
The hook can overwrite work-tree files.

It's the last that I intend to use here.)

It is also run after git clone, unless the --no-checkout (-n) option is used. The first parameter given to the hook is the null-ref, the second the ref of the new HEAD and the flag is always 1. Likewise for git worktree add unless --no-checkout is used.

This hook can be used to perform repository validity checks, auto-display differences from the previous HEAD if different, or set working dir metadata properties.

Your goal is to make the smudge filter run when HEAD is updated. Looking at the source code for builtin/checkout.c, we find that for the "change commits" case, git checkout first populates the index and work-tree, then updates the HEAD ref (first highlighted line), then runs the post-checkout hook with the two hash IDs (the first one will be the special null-hash in some cases) and the flag set to 1.

File checkouts, which by definition don't change commits, run the hook with the flag set to 0. The two hash IDs will always match, which is why the flag test is almost certainly unnecessary.

Doing the file checkouts will re-run the smudge filter. Since HEAD has now been updated, $Revision$ will expand the way you want. The obvious bad thing about this is that every work-tree file must be updated twice! There is another issue, which the Python code above works around by removing the supposedly-unmodified files, forcing git checkout to re-extract them from index to work-tree.

来源：https://stackoverflow.com/questions/49667015/evaluation-time-for-git-describe-in-smuge-filter

标签

git

git-filter