What exactly does git's “rebase --preserve-merges” do (and why?)

前端 未结 2 1863
滥情空心
滥情空心 2020-11-22 04:37

Git\'s documentation for the rebase command is quite brief:

--preserve-merges
    Instead of ignoring merges, try to recreate them.

This uses the --interact         


        
2条回答
  •  花落未央
    2020-11-22 05:28

    Git 2.18 (Q2 2018) will improve considerably the --preserve-merge option by adding a new option.

    "git rebase" learned "--rebase-merges" to transplant the whole topology of commit graph elsewhere.

    (Note: Git 2.22, Q2 2019, actually deprecates --preserve-merge, and Git 2.25, Q1 2020, stops advertising it in the "git rebase --help" output)

    See commit 25cff9f, commit 7543f6f, commit 1131ec9, commit 7ccdf65, commit 537e7d6, commit a9be29c, commit 8f6aed7, commit 1644c73, commit d1e8b01, commit 4c68e7d, commit 9055e40, commit cb5206e, commit a01c2a5, commit 2f6b1d1, commit bf5c057 (25 Apr 2018) by Johannes Schindelin (dscho).
    See commit f431d73 (25 Apr 2018) by Stefan Beller (stefanbeller).
    See commit 2429335 (25 Apr 2018) by Phillip Wood (phillipwood).
    (Merged by Junio C Hamano -- gitster -- in commit 2c18e6a, 23 May 2018)

    pull: accept --rebase-merges to recreate the branch topology

    Similar to the preserve mode simply passing the --preserve-merges option to the rebase command, the merges mode simply passes the --rebase-merges option.

    This will allow users to conveniently rebase non-trivial commit topologies when pulling new commits, without flattening them.


    git rebase man page now has a full section dedicated to rebasing history with merges.

    Extract:

    There are legitimate reasons why a developer may want to recreate merge commits: to keep the branch structure (or "commit topology") when working on multiple, inter-related branches.

    In the following example, the developer works on a topic branch that refactors the way buttons are defined, and on another topic branch that uses that refactoring to implement a "Report a bug" button.
    The output of git log --graph --format=%s -5 may look like this:

    *   Merge branch 'report-a-bug'
    |\
    | * Add the feedback button
    * | Merge branch 'refactor-button'
    |\ \
    | |/
    | * Use the Button class for all buttons
    | * Extract a generic Button class from the DownloadButton one
    

    The developer might want to rebase those commits to a newer master while keeping the branch topology, for example when the first topic branch is expected to be integrated into master much earlier than the second one, say, to resolve merge conflicts with changes to the DownloadButton class that made it into master.

    This rebase can be performed using the --rebase-merges option.


    See commit 1644c73 for a small example:

    rebase-helper --make-script: introduce a flag to rebase merges

    The sequencer just learned new commands intended to recreate branch structure (similar in spirit to --preserve-merges, but with a substantially less-broken design).

    Let's allow the rebase--helper to generate todo lists making use of these commands, triggered by the new --rebase-merges option.
    For a commit topology like this (where the HEAD points to C):

    - A - B - C (HEAD)
        \   /
          D
    

    the generated todo list would look like this:

    # branch D
    pick 0123 A
    label branch-point
    pick 1234 D
    label D
    
    reset branch-point
    pick 2345 B
    merge -C 3456 D # C
    

    What is the difference with --preserve-merge?
    Commit 8f6aed7 explains:

    Once upon a time, this here developer thought: wouldn't it be nice if, say, Git for Windows' patches on top of core Git could be represented as a thicket of branches, and be rebased on top of core Git in order to maintain a cherry-pick'able set of patch series?

    The original attempt to answer this was: git rebase --preserve-merges.

    However, that experiment was never intended as an interactive option, and it only piggy-backed on git rebase --interactive because that command's implementation looked already very, very familiar: it was designed by the same person who designed --preserve-merges: yours truly.

    And by "yours truly", the author refers to himself: Johannes Schindelin (dscho), who is the main reason (with a few other heroes -- Hannes, Steffen, Sebastian, ...) that we have Git For Windows (even though back in the day -- 2009 -- that was not easy).
    He is working at Microsoft since Sept. 2015, which makes sense considering Microsoft now heavily uses Git and needs his services.
    That trend started in 2013 actually, with TFS. Since then, Microsoft manages the largest Git repository on the planet! And, since Oct. 2018, Microsoft acquired GitHub.

    You can see Johannes speak in this video for Git Merge 2018 in April 2018.

    Some time later, some other developer (I am looking at you, Andreas! ;-)) decided that it would be a good idea to allow --preserve-merges to be combined with --interactive (with caveats!) and the Git maintainer (well, the interim Git maintainer during Junio's absence, that is) agreed, and that is when the glamor of the --preserve-merges design started to fall apart rather quickly and unglamorously.

    Here Jonathan is talking about Andreas Schwab from Suse.
    You can see some of their discussions back in 2012.

    The reason? In --preserve-merges mode, the parents of a merge commit (or for that matter, of any commit) were not stated explicitly, but were implied by the commit name passed to the pick command.

    This made it impossible, for example, to reorder commits.
    Not to mention to move commits between branches or, deity forbid, to split topic branches into two.

    Alas, these shortcomings also prevented that mode (whose original purpose was to serve Git for Windows' needs, with the additional hope that it may be useful to others, too) from serving Git for Windows' needs.

    Five years later, when it became really untenable to have one unwieldy, big hodge-podge patch series of partly related, partly unrelated patches in Git for Windows that was rebased onto core Git's tags from time to time (earning the undeserved wrath of the developer of the ill-fated git-remote-hg series that first obsoleted Git for Windows' competing approach, only to be abandoned without maintainer later) was really untenable, the "Git garden shears" were born: a script, piggy-backing on top of the interactive rebase, that would first determine the branch topology of the patches to be rebased, create a pseudo todo list for further editing, transform the result into a real todo list (making heavy use of the exec command to "implement" the missing todo list commands) and finally recreate the patch series on top of the new base commit.

    (The Git garden shears script is referenced in this patch in commit 9055e40)

    That was in 2013.
    And it took about three weeks to come up with the design and implement it as an out-of-tree script. Needless to say, the implementation needed quite a few years to stabilize, all the while the design itself proved itself sound.

    With this patch, the goodness of the Git garden shears comes to git rebase -i itself.
    Passing the --rebase-merges option will generate a todo list that can be understood readily, and where it is obvious how to reorder commits.
    New branches can be introduced by inserting label commands and calling merge .
    And once this mode will have become stable and universally accepted, we can deprecate the design mistake that was --preserve-merges.


    Git 2.19 (Q3 2018) improves the new --rebase-merges option by making it work with --exec.

    The "--exec" option to "git rebase --rebase-merges" placed the exec commands at wrong places, which has been corrected.

    See commit 1ace63b (09 Aug 2018), and commit f0880f7 (06 Aug 2018) by Johannes Schindelin (dscho).
    (Merged by Junio C Hamano -- gitster -- in commit 750eb11, 20 Aug 2018)

    rebase --exec: make it work with --rebase-merges

    The idea of --exec is to append an exec call after each pick.

    Since the introduction of fixup!/squash! commits, this idea was extended to apply to "pick, possibly followed by a fixup/squash chain", i.e. an exec would not be inserted between a pick and any of its corresponding fixup or squash lines.

    The current implementation uses a dirty trick to achieve that: it assumes that there are only pick/fixup/squash commands, and then inserts the exec lines before any pick but the first, and appends a final one.

    With the todo lists generated by git rebase --rebase-merges, this simple implementation shows its problems: it produces the exact wrong thing when there are label, reset and merge commands.

    Let's change the implementation to do exactly what we want: look for pick lines, skip any fixup/squash chains, and then insert the exec line. Lather, rinse, repeat.

    Note: we take pains to insert before comment lines whenever possible, as empty commits are represented by commented-out pick lines (and we want to insert a preceding pick's exec line before such a line, not afterward).

    While at it, also add exec lines after merge commands, because they are similar in spirit to pick commands: they add new commits.


    Git 2.22 (Q2 2019) fixes the usage of the refs/rewritten/ hierarchy to store a rebase intermediate states, which inherently makes the hierarchy per worktree.

    See commit b9317d5, commit 90d31ff, commit 09e6564 (07 Mar 2019) by Nguyễn Thái Ngọc Duy (pclouds).
    (Merged by Junio C Hamano -- gitster -- in commit 917f2cd, 09 Apr 2019)

    Make sure refs/rewritten/ is per-worktree

    a9be29c (sequencer: make refs generated by the label command worktree-local, 2018-04-25, Git 2.19) adds refs/rewritten/ as per-worktree reference space.
    Unfortunately (my bad) there are a couple places that need update to make sure it's really per-worktree.

    - add_per_worktree_entries_to_dir() is updated to make sure ref listing look at per-worktree refs/rewritten/ instead of per-repo one.

    • common_list[] is updated so that git_path() returns the correct location. This includes "rev-parse --git-path".

    This mess is created by me.
    I started trying to fix it with the introduction of refs/worktree, where all refs will be per-worktree without special treatments.
    Unfortunate refs/rewritten came before refs/worktree so this is all we can do.


    With Git 2.24 (Q4 2019), "git rebase --rebase-merges" learned to drive different merge strategies and pass strategy specific options to them.

    See commit 476998d (04 Sep 2019) by Elijah Newren (newren).
    See commit e1fac53, commit a63f990, commit 5dcdd74, commit e145d99, commit 4e6023b, commit f67336d, commit a9c7107, commit b8c6f24, commit d51b771, commit c248d32, commit 8c1e240, commit 5efed0e, commit 68b54f6, commit 2e7bbac, commit 6180b20, commit d5b581f (31 Jul 2019) by Johannes Schindelin (dscho).
    (Merged by Junio C Hamano -- gitster -- in commit 917a319, 18 Sep 2019)


    With Git 2.25 (Q1 2020), the logic used to tell worktree local and repository global refs apart is fixed, to facilitate the preserve-merge.

    See commit f45f88b, commit c72fc40, commit 8a64881, commit 7cb8c92, commit e536b1f (21 Oct 2019) by SZEDER Gábor (szeder).
    (Merged by Junio C Hamano -- gitster -- in commit db806d7, 10 Nov 2019)

    path.c: don't call the match function without value in trie_find()

    Signed-off-by: SZEDER Gábor

    'logs/refs' is not a working tree-specific path, but since commit b9317d55a3 (Make sure refs/rewritten/ is per-worktree, 2019-03-07, v2.22.0-rc0) 'git rev-parse --git-path' has been returning a bogus path if a trailing '/' is present:

    $ git -C WT/ rev-parse --git-path logs/refs --git-path logs/refs/
    /home/szeder/src/git/.git/logs/refs
    /home/szeder/src/git/.git/worktrees/WT/logs/refs/
    

    We use a trie data structure to efficiently decide whether a path belongs to the common dir or is working tree-specific.

    As it happens b9317d55a3 triggered a bug that is as old as the trie implementation itself, added in 4e09cf2acf ("path: optimize common dir checking", 2015-08-31, Git v2.7.0-rc0 -- merge listed in batch #2).

    • According to the comment describing trie_find(), it should only call the given match function 'fn' for a "/-or-\0-terminated prefix of the key for which the trie contains a value".
      This is not true: there are three places where trie_find() calls the match function, but one of them is missing the check for value's existence.

    • b9317d55a3 added two new keys to the trie:

      • 'logs/refs/rewritten', and
      • 'logs/refs/worktree', next to the already existing 'logs/refs/bisect'.
        This resulted in a trie node with the path 'logs/refs/', which didn't exist before, and which doesn't have a value attached.
        A query for 'logs/refs/' finds this node and then hits that one callsite of the match function which doesn't check for the value's existence, and thus invokes the match function with NULL as value.
    • When the match function check_common() is invoked with a NULL value, it returns 0, which indicates that the queried path doesn't belong to the common directory, ultimately resulting the bogus path shown above.

    Add the missing condition to trie_find() so it will never invoke the match function with a non-existing value.

    check_common() will then no longer have to check that it got a non-NULL value, so remove that condition.

    I believe that there are no other paths that could cause similar bogus output.

    AFAICT the only other key resulting in the match function being called with a NULL value is 'co' (because of the keys 'common' and 'config').

    However, as they are not in a directory that belongs to the common directory the resulting working tree-specific path is expected.

提交回复
热议问题