How does 'git merge' work in details?

后端未结

关注

 5  1495

夕颜 2020-11-28 02:45

I want to know an exact algorithm (or near that) behind \'git merge\'. The answers at least to these sub-questions will be helpful:

How does git detect the con

5条回答

情深已故 (楼主)

2020-11-28 03:14

I'm interested too. I don't know the answer, but...

A complex system that works is invariably found to have evolved from a simple system that worked

I think git's merging is highly sophisticated and will be very difficult to understand - but one way to approach this is from its precursors, and to focus on the heart of your concern. That is, given two files that don't have a common ancestor, how does git merge work out how to merge them, and where conflicts are?

Let's try to find some precursors. From git help merge-file:

git merge-file is designed to be a minimal clone of RCS merge; that is,
       it implements all of RCS merge's functionality which is needed by
       git(1).

From wikipedia: http://en.wikipedia.org/wiki/Git_%28software%29 -> http://en.wikipedia.org/wiki/Three-way_merge#Three-way_merge -> http://en.wikipedia.org/wiki/Diff3 -> http://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf

That last link is a pdf of a paper describing the diff3 algorithm in detail. Here's a google pdf-viewer version. It's only 12 pages long, and the algorithm is only a couple of pages - but a full-on mathematical treatment. That might seem a bit too formal, but if you want to understand git's merge, you'll need to understand the simpler version first. I haven't checked yet, but with a name like diff3, you'll probably also need to understand diff (which uses a longest common subsequence algorithm). However, there may be a more intuitive explanation of diff3 out there, if you have a google...

Now, I just did an experiment comparing diff3 and git merge-file. They take the same three input files version1 oldversion version2 and mark conflicts the way same, with <<<<<<< version1, =======, >>>>>>> version2 (diff3 also has ||||||| oldversion), showing their common heritage.

I used an empty file for oldversion, and near-identical files for version1 and version2 with just one extra line added to version2.

Result: git merge-file identified the single changed line as the conflict; but diff3 treated the whole two files as a conflict. Thus, sophisticated as diff3 is, git's merge is even more sophisticated, even for this simplest of cases.

Here's the actual results (I used @twalberg's answer for the text). Note the options needed (see respective manpages).

$ git merge-file -p fun1.txt fun0.txt fun2.txt

You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:

    Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B.  Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.
<<<<<<< fun1.txt
=======
THIS IS A BIT DIFFERENT
>>>>>>> fun2.txt

The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...

$ diff3 -m fun1.txt fun0.txt fun2.txt

<<<<<<< fun1.txt
You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:

    Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B.  Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.

The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...
||||||| fun0.txt
=======
You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:

    Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B.  Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.
THIS IS A BIT DIFFERENT

The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...
>>>>>>> fun2.txt

If you are truly interested in this, it's a bit of a rabbit hole. To me, it seems as deep as regular expressions, the longest common subsequence algorithm of diff, context free grammars, or relational algebra. If you want to get to the bottom of it, I think you can, but it will take some determined study.

0 讨论(0)

查看其它5个回答