Why does “git stash push” cause “Unmerged paths: … both modified: …”?

折月煮酒 提交于 2020-11-29 03:55:52

问题


Q1: How can I reproduce this scenario? (I unsuccessfully tried to reproduce it)

Q2: What does this Unmerged paths:... both modified:... state mean?


I did a git stash (push) and got into an Unmerged paths: ... both modified: ... state, and I don't know why / what it means.


My steps:

  1. modified a single file
  2. (I may have done a git add -p, and then might have modified the working tree copy (I don't remember))
  3. $ git stash -m 'my message'
  4. $ git status
Unmerged paths:
  (use "git restore --staged <file>..." to unstage)
  (use "git add <file>..." to mark resolution)
    both modified:   questions/templates/flashcard.html

no changes added to commit (use "git add" and/or "git commit -a")

回答1:


You're starting from a false premise: git stash push itself does not cause the problem. It merely suffers from the problem, once the problem exists.

(I unsuccessfully tried to reproduce it)

That is not surprising. 😀

What does this Unmerged paths ... mean?

It means that some action that used Git's merge engine started a merge, and was unable to complete the merge due to merge conflicts. If you're looking for a list of things that invoke Git's merge engine, it's rather long:

  • git merge can do it, of course;
  • git cherry-pick and git revert—which are implemented with a common bit of code—can do it;
  • git rebase can invoke git cherry-pick and therefore can do it;
  • git stash apply invokes git merge-recursive—the inner part of git merge—directly, and can therefore do it, and git stash pop runs git stash apply;

and many other Git operations might invoke one of these, e.g., git pull runs git fetch, followed by either git merge or git rebase. Given that you're fussing about with git stash, I would guess it was an earlier git stash apply, perhaps invoked via git stash pop, that got you into the problematic state.

A quick-ish deep dive into "unmerged paths"

The best way to comprehend merge conflicts and the unmerged path state is to start with git merge itself. Imagine the standard two different users, Alice and Bob, start from some common starting point, i.e., some commit in a Git repository that both have cloned:

...--G--H   <-- main (HEAD)

The uppercase letters here stand in for individual commits; commit H is merely the hash ID of the last commit on the main branch.

Alice, in her copy of the repository, creates a new branch name alice and begins working. She makes two new commits, each of which get a new, unique hash ID. In Alice's repository these look like this:

          I--J   <-- alice (HEAD)
         /
...--G--H   <-- main

Meanwhile Bob does the same thing, but of course gets different (still-unique, but different from Alice's) hash IDs, so these become:

...--G--H   <-- main
         \
          K--L   <-- bob (HEAD)

You, or Alice or Bob—someone anyway—now put all these commits into one repository, using the same names (alice and bob) to identify the same commit hash IDs (J and L respectively) to get:

          I--J   <-- alice
         /
...--G--H   <-- main (HEAD)
         \
          K--L   <-- bob

Or, maybe you use slightly different names, like alice/alice or bob/bob or origin/alice or whatever. The names don't matter: it's having, and being able to find, the commits, with their unique hash IDs, that matters.

In the drawing just above we're currently on branch main, which isn't where we want to be, so let's run git checkout alice or git checkout bob as needed, to make the current commit be commit J or L. Let's use commit J, via the name alice. (If Alice is the one doing the merge, she probably has something like bob/bob as the name of Bob's commit and is already / still on commit J, so if she's doing the merge, she doesn't even need to run git checkout here. But we're doing it, so we use git checkout alice.)

          I--J   <-- alice (HEAD)
         /
...--G--H
         \
          K--L   <-- bob

The name main still exists, I'm just going to stop drawing it now as it is about to get in the way. We now run:

git merge bob

Git will:

  • locate our current commit (easy, it's J);
  • use the name bob to locate the other commit (L); and
  • use both commits to work backwards to find out what common, shared commit Alice and Bob started with. This is the merge base commit and it is crucial to the merging process—the merge-as-a-verb operation performed by Git's merge engine.

In this case, the merge base is commit H. So Git finds H and supplies these three commits: H (merge base), J (current or --ours commit), and L (other or --theirs commit) to the appropriate Git merge strategy.1 The merge strategy is responsible for executing the merge: doing the merge-as-a-verb part.

If all goes well, Git finishes the merge on its own and makes a new commit:

          I--J
         /    \
...--G--H      M   <-- alice (HEAD)
         \    /
          K--L   <-- bob

New commit M is a merge commit because it has two parents, J and L, instead of the usual one (which would be J if this weren't a merge commit). But this only happens if Git is able to complete the merge on its own.


1The strategy, -s recursive by default, is the implementation of the merge engine. There is more than one strategy, so there is actually more than one merge engine. Each of the various strategies does something at least a little bit different—there's no point in having two identical strategies after all—but we'll ignore all except for the simple case of recursive. The recursive and resolve strategies do the same thing as long as there is exactly one merge base. If there is more than one merge base, they become different, but that case is fairly rare.


Git's index

At this point, we need a sidebar to discuss Git's index. The index, in Git, is a pretty central thing. It's crucial to making any new commit, for instance. This important entity has this very generic, meaningless name, "index". That was probably a mistake, and there's a newer and better-for-many-purposes name, which is the staging area.2 This refers to how the index is most often used: to hold the proposed next commit.

When you use git checkout to extract some commit, Git fills in both its own index—the staging area—and your working tree or work-tree from the files stored in that commit. Each commit holds a full snapshot of every file, in a compressed, read-only, Git-only, de-duplicated format. You can't actually work on or with these files, so Git has to extract them. Getting them out into your work-tree would suffice, and is what other non-Git version control systems do; but those other VCSes aren't Git. Git gets them out into both your work-tree and its own index.

The index copies are pre-compressed and pre-de-duplicated (and being de-duplicated, are actually just a reference to the existing copy anyway). They're ready to go into a new commit. As you fuss with your work-tree copies of the files, nothing happens to the index copies. That's why you have to run git add all the time. The git add step reads the work-tree copy, compresses and de-duplicates it, and stuffs the result into Git's index. Now the updated file is ready to be committed.

So, at all times, Git's index holds your proposed next commit. You set it up initially with your git checkout. Then you change your work-tree files, which doesn't do anything useful on its own. But then you git add, which replaces the index copies of the files you tell it to update. Git's index still holds your proposed next commit, it's just that now, that proposed next commit doesn't match the one you checked out earlier.

So that's the normal, everyday work of Git's index: it holds your proposed next commit. You update files in it, adding new ones if you like, removing existing ones if you like; this updates your proposed next commit. You get the files to add to it or update in it from your work-tree, where you have ordinary files that you can use ordinary programs—including your editors—on. Git fills in your work-tree for you, and fills in its index for itself, and then you update Git's index in this roundabout fashion. Commands like git add -p merely update an index copy of a file a piece at a time, rather than taking the whole work-tree file as a replacement.3


2There's even a third name, the cache, but this name is not used very often any more. You mostly see it in flags: git rm --cached, git diff --cached, and so on. Some of these let you use --staged but every Git command is different here. It's not very consistent. This is one of Git's weaknesses: an inconsistent and often confusing command line.

3The actual implementation of git add -p is:

  • extract index copy to a temporary file;
  • diff the temporary file and the work-tree file to get a series of diff hunks;
  • apply diff hunk(s) to the temporary file as directed;
  • git add the entire, now-modified temporary file into the index under the original file's name; and
  • remove the temporary file.

At least, that's the case for the existing Perl-based git add -p. There's an ongoing effort to rewrite git add -p in C, which may do more of this in memory and/or try to be more efficient—the Perl implementation updates the index copy after each hunk is applied, when it might be better to wait—but the same principles will hold: you're just taking the index copy, making a change to it, and then stuffing that updated file back into the index, compressing and de-duplicating the content to get it into the index.


Git's index during merges

To make all of the above work, Git's index needs to hold one entry per work-tree file. More precisely, it holds one entry per tracked file, and it's the presence of this entry in Git's index that makes the file a "tracked file". A new commit uses whatever is in Git's index at the time you run git commit, so the new commit contains exactly those files that are in Git's index.

To do a merge, however, we need to look at three commits, rather than just one. The three inputs to the merge are:

  1. the merge base commit;
  2. the current (--ours) commit;
  3. the other (--theirs) commit.

To handle this, the merge process expands the index. All the files that are in the index are now considered to be in "slot zero", with each entry having four numbered slots. Slot zero is the "all done" slot. Slot 1 is for the merge base, slot 2 is for --ours, and slot 3 is for --theirs.4 The current commit is, at the beginning of a merge, already in Git's index; it's just in the wrong slot—slot zero, instead of slot 2. So the start of the process moves these to slot 2, or reads the current commit into slot 2, however you'd like to look at it.5 Meanwhile, it reads the merge base commit into slot 1, and the --theirs commit into slot 3.

What this means is that the index now holds three copies of every file.6 The merge process proceeds to decide what to do with each file:

  • If all three copies of a file are identical, no one changed the file. Use any copy of the file as the merge result.

  • If two copies match—base and ours, or base and theirs—the other person changed the file. Use the changed file as the merge result.

  • If all three copies differ, this file requires actual merging.

If a file doesn't require any actual merging, Git can just take the right index slot and renumber that as "slot zero", erasing the other slots. That file is now merged. If the work-tree file needs replacing, Git can replace the work-tree file at the same time.

If a file does require actual merging, Git compares the merge base version (in slot 1) against each of the other two files (in slots 2 and 3), line-by-line. These comparisons produce a set of changes. Git tries to combine the two sets of changes. If Alice changed line 3, and Bob didn't, Git can take Alice's change. If Bob changed line 42, and Alice didn't, Git can take Bob's change. All the non-conflicting changes, that touch different lines in the merge base file, can be piled together.

If Bob and Alice touched the same lines,7 though, then:

  • either they made the same change, so Git can just take one copy of it; or
  • they made different changes. Git doesn't know which one(s) to use!

For the last case, Git declares a merge conflict. Git leaves all three copies of the file in the index, in their nonzero staging slots. Git writes to the work-tree file its best effort at merging the files—using the combined changes where Git was able to combine them—and conflict markers and the conflicting sections where Git was not able to combine the changes.

Your job, as the person watching over Git as it runs its merge process, is now to resolve the conflict by producing—by any means you like—the correct merge result and stuffing that into Git's index. One way to do that is by editing the ordinary file, with the conflict markers in it. Resolve the conflict here, then run git add path. This tells Git to erase all three high numbered slots, compressing and de-duplicating the work-tree copy of the file and stuffing that into index slot #0 as usual. This particular merge conflict is now resolved.

Another way to do this is to run git mergetool, which uses lower-level Git commands to extract all three files from the index, and run some third-party program (your editor, some merge tool, whatever) on them. This third-party program has to resolve the conflicts—perhaps with your input—and write the result to your work-tree, where git mergetool will read it using git add. So as you can see, git mergetool is an awful lot like resolving it by hand—it just gives you an easy way to run something on all three files, instead of on Git's best-effort-with-conflict-markers copy.


4The actual implementation uses separate entries, each with a slot number in them, rather than one single entry per file with four slot numbers. However, the on-disk index format could be changed in the future, and has been changed in the past. What you're really promised is what is described in the documentation for git ls-files --stage and git update-index.

5This glosses over the fact that it's potentially possible for the index's contents to differ from the current commit's contents. The git merge command, and many of the others, by default check first to make sure that this isn't the case, and fail (stop executing with an error message) if the index does not match the current commit. They also check that the index and work-tree match. If all three match, things are much safer, because if anything goes wrong, all the files you care about are safely stored in the current commit.

The git stash apply code is not one of these safety-checking commands. An apply that goes wrong can result in a mess that is very hard to recover from. I advise avoiding this situation whenever possible, so that you don't even need to wonder whether it's "read commit into slot 2" or "write 2 into the slot numbers of existing index entries".

6This glosses over the fact that some files might not exist in some of the commits. For instance, if Alice created an all-new file, that file doesn't exist in slots 1 and 3, only in slot 2. If Bob deleted an existing file, that file exists only in slots 1 and 2, and not in 3.

We've also completely ignored complicated conflicts that can occur here, where, e.g., Alice modifies a file, and Bob deletes that same file, or Alice and Bob both create a new file with the same name, but different content. Git detects some—though not all—file rename cases, and those can produce conflicts as well. Rename detection and some of these conflicts are special cases, while the remaining cases can be seen directly via the index slot entries.

7For safety purposes and to handle ordering issues at the end of the file, Git considers two changes that "touch" to be in conflict. For instance, if Alice changed line 14 but not line 15, and Bob added a line after line 14 and before line 15, these two changes abut and Git declares a merge conflict here.


Cherry-pick, revert, and other operations use the merge engine

While the above is for git merge itself—where we had a merge base commit H and two branch-tip commits J and L—many other Git operations will use the merge engine. To make this work, they simply assign some commit to act as the merge base. They pick the current commit as the current commit (always), and they assign some other commit as the --theirs commit.

For cherry-pick, the merge base commit is the parent commit of the commit you tell Git to cherry-pick, and the --theirs commit is the commit you tell Git to cherry-pick. Using the parent of the commit as the merge base produces the desired effect: we take their change, as defined by diff-ing the parent and the commit, and add it to our current commit.

For git stash apply, the merge base commit is the parent commit of the work-tree commit in the stash being applied. The git stash push or git stash save command created two or three commits, of which one or two or all three can be used later during the apply step. Most stashes make two commits and the standard git stash apply uses only one of those two (plus its parent). For details, see, e.g., How to recover from "git stash save --all"?

Conclusion

Now we know what:

    both modified:   questions/templates/flashcard.html

means: that currently, in Git's index, there are three copies of the file named questions/templates/flashcard.html. The copy in slot #1 is from the merge base; the copy in slot #2 is from the commit that was current, or got into the complications mentioned in footnote 5 above; and the copy in slot #3 was from the commit being used as the other commit. Comparing these copies, all three are different.

You can view the actual file contents with:

git show :1:questions/templates/flashcard.html

which shows the merge base copy,

git show :2:questions/templates/flashcard.html

which shows the --ours slot-2 copy, and:

git show :3:questions/templates/flashcard.html

which shows the --theirs slot-3 copy. Git will have written, to your work-tree questions/templates/flashcard.html file, its best effort at merging these.

When you are in this state—where the index holds any nonzero-numbered entries—none of the operations that writes a new commit can function, because Git can only write out the index when all entries are in staging slot #0. To fix the problem, you must use git add or git rm to update Git's index.8


8You can also use git update-index, but this requires getting into the details of how the index entries are stored.



来源:https://stackoverflow.com/questions/64450033/why-does-git-stash-push-cause-unmerged-paths-both-modified

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!