Unexpected behavior with “git commit .” when pre-commit hook modifies staged files

问题

In my experience git commit -a has had equivalent behavior to git commit . However, recently I have created a pre-commit hook that automatically formats my source code and now git commit . has some unexpected side-effects: the file that is committed ends up as modified in the working directory and in the index after the commit command finishes. This doesn't happen with git commit -a. I'm trying to understand what is going on behind the scenes when running git commit . that is causing this to happen and see if there is a way to handle it properly within my pre-commit hook script.

pre-commit hook:

git_toplevel=$(git rev-parse --show-toplevel)

git --no-pager diff -z --cached --name-only --diff-filter=ACMRT | $git_toplevel/meta/reformat.bash -s files
git --no-pager diff -z --name-only --diff-filter=ACMRT | xargs -0 --no-run-if-empty git add

Currently using git version 1.8.3.1 but am seeing the same behavior in more recent versions.

Here are the sequence of commands for a simple space added at the beginning of a line:

[]$ git status
# On branch eroller/format-clean-filter
# Your branch is ahead of 'origin/eroller/format-clean-filter' by 1 commit.
#   (use "git push" to publish your local commits)
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   src/host/cnv/denovo/denovo_cnv.cpp
#
no changes added to commit (use "git add" and/or "git commit -a")

[]$ git diff
diff --git a/src/host/cnv/denovo/denovo_cnv.cpp b/src/host/cnv/denovo/denovo_cnv.cpp
index 7cfb8dc..14058e3 100644
--- a/src/host/cnv/denovo/denovo_cnv.cpp
+++ b/src/host/cnv/denovo/denovo_cnv.cpp
@@ -28,7 +28,7 @@ using namespace std;
 namespace cnv {
 namespace denovo {

-SegmentsBySample LoadCallsForSamples(const vector<string>& callFiles, const ReferenceDictionary& reference)
+ SegmentsBySample LoadCallsForSamples(const vector<string>& callFiles, const ReferenceDictionary& reference)
 {
   function<SegmentsBySample::value_type(const string&)> loadCalls = [&](string callFile) {
     return LoadCalls(callFile, reference);

[]$ git commit -m 'test' .

[]$ git status
# On branch eroller/format-clean-filter
# Your branch is ahead of 'origin/eroller/format-clean-filter' by 2 commits.
#   (use "git push" to publish your local commits)
#
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       modified:   src/host/cnv/denovo/denovo_cnv.cpp
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   src/host/cnv/denovo/denovo_cnv.cpp
#

[]$ git diff
diff --git a/src/host/cnv/denovo/denovo_cnv.cpp b/src/host/cnv/denovo/denovo_cnv.cpp
index 14058e3..7cfb8dc 100644
--- a/src/host/cnv/denovo/denovo_cnv.cpp
+++ b/src/host/cnv/denovo/denovo_cnv.cpp
@@ -28,7 +28,7 @@ using namespace std;
 namespace cnv {
 namespace denovo {

- SegmentsBySample LoadCallsForSamples(const vector<string>& callFiles, const ReferenceDictionary& reference)
+SegmentsBySample LoadCallsForSamples(const vector<string>& callFiles, const ReferenceDictionary& reference)
 {
   function<SegmentsBySample::value_type(const string&)> loadCalls = [&](string callFile) {
     return LoadCalls(callFile, reference);

[]$ git diff --cached
diff --git a/src/host/cnv/denovo/denovo_cnv.cpp b/src/host/cnv/denovo/denovo_cnv.cpp
index 7cfb8dc..14058e3 100644
--- a/src/host/cnv/denovo/denovo_cnv.cpp
+++ b/src/host/cnv/denovo/denovo_cnv.cpp
@@ -28,7 +28,7 @@ using namespace std;
 namespace cnv {
 namespace denovo {

-SegmentsBySample LoadCallsForSamples(const vector<string>& callFiles, const ReferenceDictionary& reference)
+ SegmentsBySample LoadCallsForSamples(const vector<string>& callFiles, const ReferenceDictionary& reference)
 {
   function<SegmentsBySample::value_type(const string&)> loadCalls = [&](string callFile) {
     return LoadCalls(callFile, reference);

UPDATE: Using the very thorough answer from @torek (thanks!), I decided to give an error in the pre-commit hook if the user tries to use git commit . or git commit [--only] -- <files>. Here is the check in my pre-commit script:

if [[ $GIT_INDEX_FILE != *"/index" ]] && [[ $GIT_INDEX_FILE != *"/index.lock" ]] ; then
  echo "Error: pre-commit reformatting using unsupported index file ($GIT_INDEX_FILE)." >&2
  echo "       Are you using 'git commit [--only] -- <files>' to bypass staging?" >&2
  echo "       Use git commit -a or stage your files before committing using git add -- <files>" >&2
  echo "       Use '--no-verify' to bypass reformatting (not recommended)" >&2
  exit 1
fi

回答1:

The fundamental problem here is that Git makes commits not from the work-tree but from the index, which is why you need to git add files in the first place—but the index is a sort of white lie, because there can be more index fles than just the one standard one. (The index is also called the staging area or the cache, depending on which part of Git is doing the calling.)

The index, by which I mean the one standard one, is a file in .git named index. If you inspect your .git directory you will find such a file. In the past, there really was only this one file. In modern Git (2.5 on up), the picture is considerably cloudier due to added work-trees: there's actually one index file per work-tree, so that .git/index is only the index for the main work-tree. There's an auxiliary the index per work-tree—but that's not quite what I mean to get at, here, it's just a case of showing how the assumption that there is one single index is already fraying at the edges. Admittedly, you're using Git 1.8.3.1 (which is really quite old) but it, too, is more complex than the nice simple white-lie "one index" setup.

When you use git commit -a, Git makes a new, extra index. When you use git commit ., you're invoking git commit --only . (see the documentation for details), and Git makes two new extra indexes (indices?).

All parts of Git have the ability to redirect the rest of Git to use a different, non-standard index, and these various options to git commit make use of this feature. Note that git commit -a is equivalent to git commit --include followed by the names of any files that need adding. The really tricky case is the one you're using, git commit --only.

Once you start multiplying index files, things get confusing!

Remember that the index is, in essence, the proposed next commit. If there's only one index (for this work-tree, if we're talking Git 2.5 or later), there's only one proposed next commit. That's not too difficult, we just have to consider that there are three copies of every file. Let's pick a file such as README.md:

HEAD:README.md is the currently committed version of README.md. You can't change it. (You can move HEAD itself, but the committed copy of README.md is inside the commit, as found by the commit's hash ID, and won't change.)

The name HEAD:README.md only works inside Git. That name accesses this frozen, Git-ified, freeze-dried copy of the file; this copy will never change. You can see it wth git show HEAD:README.md, for instance.
:README.md is the copy of README.md in the index. It was originally the same as HEAD:README.md but if you ran git add README.md, it might be different now.

The name :README.md also only works inside Git. That name accesses this replaceable, but Git-ified (freeze-dried format) copy of the file, as stored in the index. You can replace this any time with git add.
Finally, README.md is an ordinary (non-Git-ified) file. It's not in Git! It's not in the index! It's in your work-tree, where you can see it and work on it, using all your normal computer tools. Git really doesn't use this file for anything, it just overwrites it or removes it when you check out some other commit. The only thing Git does with it, other than check it with git status and such, is let you use git add to copy it back into the index, overwriting what was there before (and freeze-drying it in the process).

Running git status runs two git diffs:

The first compares the HEAD commit to the index, i.e., what's in the current commit vs what's in the proposed next commit. Anything different here is listed as staged for commit. Anything that's the same, Git just quietly says nothing.
The second git diff compares the index to the work-tree, i.e., what's in the proposed commit, vs what you could copy into the index. Anything different here is listed as not staged for commit. Anything that's the same, again, Git quietly says nothing.
(Then there's a final pass to check for files in the work-tree that aren't in the index at all. Git will whine about these, saying that they are untracked, unless you list them in a .gitignore. Being listed in .gitignore doesn't change whether there is a copy of the file in the index, it just changes whether Git whines.)

When you run git commit, Git packages up whatever is in the index, and uses that to make the new commit ... unless you use --only, --include, or -a.

Indices out the wazoo

With git commit --only, Git makes three index files:

One is the standard one. It's untouched at the start. That's the normal .git/index.
One is a copy of that one, with the --only files git added to it. It's in .git/index.lock at some point. Maybe it's always here! If so, that would offer a way to handle the case I outline below. But there's no documentation that promises this.
The third is a fresh one made by first extracting HEAD, then git adding the --only files to it.

If you did not git add anything before you ran git commit -a, the first and third index files match, because adding the --only files to the regular index has the same effect as making a new temporary index from HEAD and adding the --only files to it. But otherwise all three files might be different!

Git then makes the new commit from the third index. If the new commit succeeds, Git replaces the regular index with the second index (this replacement happens via a rename system call). Otherwise Git goes back to the normal index. (Note that nothing happens to the work-tree at all.)

If you use git commit --include or git commit -a, Git makes only one extra index, so that you have:

the standard index in .git/index, with whatever you had added so far; and
an extra index in a temporary file: this starts as a copy of the standard index, but then Git adds the listed files, or other modified files, to that index.

Then Git starts the commit process. If it all goes well, when Git is done, Git renames the temporary index so that it becomes the standard index. If things go badly, Git removes the temporary index and the standard index remains unchanged. Again, nothing happens to the work-tree.

Introducing pre-commit hooks

Git runs your pre-commit hook after preparing any extra index files. The special environment variable $GIT_INDEX_FILE names the index that Git will use to make the new commit. So there are three cases, two of which are not too bad and one of which is terrible:

You're doing a normal commit. GIT_INDEX_FILE names the normal index, and everything is normal.
You're doing a git commit --include or git commit -a and GIT_INDEX_FILE names the second index; there's no third index; if the commit completes, Git will rename the second index.
You're doing a git commit --only and GIT_INDEX_FILE names the third index. There's no easy way to find the second index, the one that will be in place after the commit, if the commit succeeds!

Your job, should you choose to make changes to the files stored in the index, is to make them to the index that Git will use to commit. To do that, you can use git add if you like, as this will copy files from the work-tree to the index named in $GIT_INDEX_FILE.

The first problem, though, is that you must not look at the files in the work-tree. They are irrelevant! They may contain something entirely different from what's in the index. This is particularly true during git commit --only.

The second and bigger problem is that if you've updated the third index that git commit --only is using, you should also update the second index that git commit --only is using. This part is tricky, because there is no easy way to find it, other than to assume it is in .git/index.lock. While that might work I won't advise it here.

I really have no suggestions for this—any sneaky method you find may break, as the code to deal with this third index (which the current 2.21-ish Git calls the "false index") has changed a lot between 1.8 and modern Git. The usual best-practice recommendation is not to do any special formatting in a Git hook at all. Instead, have the Git hook merely check whether the index copy of the file is correctly formatted: if so, proceed with the commit, and if not, abort the commit. Leave the rest to the user.

One more alternative

An alternative that I have seen and used is to check the actual setting of $GIT_INDEX_FILE. If it's set to .git/index, the user is using git commit without any special settings. Another trick in this same pre-commit hook (which invokes clang-format and autopep8) is to compare the index and work-tree for files that would be formatted, and refuse to run if they don't match.

来源：https://stackoverflow.com/questions/55582892/unexpected-behavior-with-git-commit-when-pre-commit-hook-modifies-staged-fil

标签

git

code-formatting

git-commit

pre-commit-hook