Using Git, show all commits that exist *only* on one specific branch, and not *any* others

后端 未结 8 1487
小蘑菇
小蘑菇 2020-12-07 08:17

Given a branch, I\'d like to see a list of commits that exist only on that branch. In this question we discuss ways to see which commits are on one branch

相关标签:
8条回答
  • 2020-12-07 08:57

    This is not exactly a real answer, but I need access to formatting, and a lot of space. I'll try to describe the theory behind what I consider the two best answers: the accepted one and the (at least currently) top-ranked one. But in fact, they answer different questions.

    Commits in Git are very often "on" more than one branch at a time. Indeed, that's much of what the question is about. Given:

    ...--F--G--H   <-- master
             \
              I--J   <-- develop
    

    where the uppercase letters stand in for actual Git hash IDs, we're often looking for only commit H or only commits I-J in our git log output. Commits up through G are on both branches, so we'd like to exclude them.

    (Note that in graphs drawn like this, newer commits are towards the right. The names select the single right-most commit on that line. Each of those commits has a parent commit, which is the commit to their left: the parent of H is G, and the parent of J is I. The parent of I is G again. The parent of G is F, and F has a parent that simply isn't shown here: it's part of the ... section.)

    For this particularly simple case, we can use:

    git log master..develop    # note: two dots
    

    to view I-J, or:

    git log develop..master    # note: two dots
    

    to view H only. The right-side name, after the two dots, tells Git: yes, these commits. The left-side name, before the two dots, tells Git: no, not these commits. Git starts at the end—at commit H or commit J—and works backwards. For (much) more about this, see Think Like (a) Git.

    The way the original question is phrased, the desire is to find commits that are reachable from one particular name, but not from any other name in that same general category. That is, if we have a more complex graph:

                   O--P   <-- name5
                  /
                 N   <-- name4
                /
    ...--F--G--H--I---M   <-- name1
             \       /
              J-----K   <-- name2
               \
                L   <-- name3
    

    we could pick out one of these names, such as name4 or name3, and ask: which commits can be found by that name, but not by any of the other names? If we pick name3 the answer is commit L. If we pick name4, the answer is no commits at all: the commit that name4 names is commit N but commit N can be found by starting at name5 and working backwards.

    The accepted answer works with remote-tracking names, rather than branch names, and allows you to designate one—the one spelled origin/merge-only—as the selected name and look at all other names in that namespace. It also avoids showing merges: if we pick name1 as the "interesting name", and say show me commits that are reachable from name1 but not any other name, we'll see merge commit M as well as regular commit I.

    The most popular answer is quite different. It's all about traversing the commit graph without following both legs of a merge, and without showing any of the commits that are merges. If we start with name1, for instance, we won't show M (it's a merge), but assuming the first parent of merge M is commit I, we won't even look at commits J and K. We'll end up showing commit I, and also commits H, G, F, and so on—none of these are merge commits and all are reachable by starting at M and working backwards, visiting only the first parent of each merge commit.

    The most-popular answer is pretty well suited to, for instance, looking at master when master is intended to be a merge-only branch. If all "real work" was done on side branches which were subsequently merged into master, we will have a pattern like this:

    I---------M---------N   <-- master
     \       / \       /
      o--o--o   o--o--o
    

    where all the un-letter-named o commits are ordinary (non-merge) commits and M and N are merge commits. Commit I is the initial commit: the very first commit ever made, and the only one that should be on master that isn't a merge commit. If the git log --first-parent --no-merges master shows any commit other than I, we have a situation like this:

    I---------M----*----N   <-- master
     \       / \       /
      o--o--o   o--o--o
    

    where we want to see commit * that was made directly on master, not by merging some feature branch.

    In short, the popular answer is great for looking at master when master is meant to be merge-only, but is not as great for other situations. The accepted answer works for these other situations.

    Are remote-tracking names like origin/master branch names?

    Some parts of Git say they're not:

    git checkout master
    ...
    git status
    

    says on branch master, but:

    git checkout origin/master
    ...
    git status
    

    says HEAD detached at origin/master. I prefer to agree with git checkout / git switch: origin/master is not a branch name because you cannot get "on" it.

    The accepted answer uses remote-tracking names origin/* as "branch names":

    git log --no-merges origin/merge-only \
        --not $(git for-each-ref --format="%(refname)" refs/remotes/origin |
        grep -Fv refs/remotes/origin/merge-only)
    

    The middle line, which invokes git for-each-ref, iterates over the remote-tracking names for the remote named origin.

    The reason this is a good solution to the original problem is that we're interested here in someone else's branch names, rather than our branch names. But that means we've defined branch as something other than our branch names. That's fine: just be aware that you're doing this, when you do it.

    git log traverses some part(s) of the commit graph

    What we're really searching for here are series of what I have called daglets: see What exactly do we mean by "branch"? That is, we're looking for fragments within some subset of the overall commit graph.

    Whenever we have Git look at a branch name like master, a tag name like v2.1, or a remote-tracking name like origin/master, we tend to want to have Git tell us about that commit and every commit that we can get to from that commit: starting there, and working backwards.

    In mathematics, this is referred to as walking a graph. Git's commit graph is a Directed Acyclic Graph or DAG, and this kind of graph is particularly suited for walking. When walking such a graph, one will visit each graph vertex that is reachable via the path being used. The vertices in the Git graph are the commits, with the edges being arcs—one-way links—going from each child to each parent. (This is where Think Like (a) Git comes in. The one-way nature of arcs means that Git must work backwards, from child to parent.)

    The two main Git commands for graph-walking are git log and git rev-list. These commands are extremely similar—in fact they're mostly built from the same source files—but their output is different: git log produces output for humans to read, while git rev-list produces output meant for other Git programs to read.1 Both commands do this kind of graph-walk.

    The graph walk they do is specifically: given some set of starting point commits (perhaps just one commit, perhaps a bunch of hash IDs, perhaps a bunch of names that resolve to hash IDs), walk the graph, visiting commits. Particular directives, such as --not or a prefix ^, or --ancestry-path, or --first-parent, modify the graph walk in some way.

    As they do the graph walk, they visit each commit. But they only print some selected subset of the walked commits. Directives such as --no-merges or --before <date> tell the graph-walking code which commits to print.

    In order to do this visiting, one commit at a time, these two command use a priority queue. You run git log or git rev-list and give it some starting point commits. They put those commits into the priority queue. For instance, a simple:

    git log master
    

    turns the name master into a raw hash ID and puts that one hash ID into the queue. Or:

    git log master develop
    

    turns both names into hash IDs and—assuming these are two different hash IDs—puts both into the queue.

    The priority of the commits in this queue is determined by still more arguments. For instance, the argument --author-date-order tells git log or git rev-list to use the author timestamp, rather than the committer timestamp. The default is to use the committer timestamp and pick the newest-by-date commit: the one with the highest numerical date. So with master develop, assuming these resolve to two different commits, Git will show whichever one came later first, because that will be at the front of the queue.

    In any case, the revision walking code now runs in a loop:

    • While there are commits in the queue:
      • Remove the first queue entry.
      • Decide whether to print this commit at all. For instance, --no-merges: print nothing if it is a merge commit; --before: print nothing if its date does not come before the designated time. If printing isn't suppressed, print the commit: for git log, show its log; for git rev-list, print its hash ID.
      • Put some or all of this commit's parent commits into the queue (as long as it isn't there now, and has not been visited already2). The normal default is to put in all parents. Using --first-parent suppresses all but the first parent of each merge.

    (Both git log and git rev-list can do history simplification with or without parent rewriting at this point as well, but we'll skip over that here.)

    For a simple chain, like start at HEAD and work backwards when there are no merge commits, the queue always has one commit in it at the top of the loop. There's one commit, so we pop it off and print it and put its (single) parent into the queue and go around again, and we follow the chain backwards until we reach the very first commit, or the user gets tired of git log output and quits the program. In this case, none of the ordering options matter: there is only ever one commit to show.

    When there are merges and we follow both parents—both "legs" of the merge—or when you give git log or git rev-list more than one starting commit, the sorting options matter.

    Last, consider the effect of --not or ^ in front of a commit specifier. These have several ways to write them:

    git log master --not develop
    

    or:

    git log ^develop master
    

    or:

    git log develop..master
    

    all mean the same thing. The --not is like the prefix ^ except that it applies to more than one name:

    git log ^branch1 ^branch2 branch3
    

    means not branch1, not branch2, yes branch3; but:

    git log --not branch1 branch2 branch3
    

    means not branch1, not branch2, not branch3, and you have to use a second --not to turn it off:

    git log --not branch1 branch2 --not branch3
    

    which is a bit awkward. The two "not" directives are combined via XOR, so if you really want, you can write:

    git log --not branch1 branch2 ^branch3
    

    to mean not branch1, not branch2, yes branch3, if you want to obfuscate.

    These all work by affecting the graph walk. As git log or git rev-list walks the graph, it makes sure not to put into the priority queue any commit that is reachable from any of the negated references. (In fact, they affect the starting setup too: negated commits can't go into the priority queue right from the command line, so git log master ^master shows nothing, for instance.)

    All of the fancy syntax described in the gitrevisions documentation makes use of this, and you can expose this with a simple call to git rev-parse. For instance:

    $ git rev-parse origin/pu...origin/master     # note: three dots
    b34789c0b0d3b137f0bb516b417bd8d75e0cb306
    fc307aa3771ece59e174157510c6db6f0d4b40ec
    ^b34789c0b0d3b137f0bb516b417bd8d75e0cb306
    

    The three-dot syntax means commits reachable from either left or right side, but excluding commits reachable from both. In this case the origin/master commit, b34789c0b, is itself reachable from origin/pu (fc307aa37...) so the origin/master hash appears twice, once with a negation, but in fact Git achieves the three-dot syntax by putting in two positive references—the two non-negated hash IDs—and one negative one, represented by the ^ prefix.

    Simiarly:

    $ git rev-parse master^^@
    2c42fb76531f4565b5434e46102e6d85a0861738
    2f0a093dd640e0dad0b261dae2427f2541b5426c
    

    The ^@ syntax means all the parents of the given commit, and master^ itself—the first parent of the commit selected by branch-name master—is a merge commit, so it has two parents. These are the two parents. And:

    $ git rev-parse master^^!
    0b07eecf6ed9334f09d6624732a4af2da03e38eb
    ^2c42fb76531f4565b5434e46102e6d85a0861738
    ^2f0a093dd640e0dad0b261dae2427f2541b5426c
    

    The ^! suffix means the commit itself, but none of its parents. In this case, master^ is 0b07eecf6.... We already saw both parents with the ^@ suffix; here they are again, but this time, negated.


    1Many Git programs literally run git rev-list with various options, and read its output, to know what commits and/or other Git objects to use.

    2Because the graph is acyclic, it's possible to guarantee that none have been visited already, if we add the constraint never show a parent before showing all of its children to the priority. --date-order, --author-date-order, and --topo-order add this constraint. The default sort order—which has no name—doesn't. If the commit timestamps are screwy—if for instance some commits were made "in the future" by a computer whose clock was off—this could in some cases lead to odd looking output.


    If you made it this far, you now know a lot about git log

    Summary:

    • git log is about showing some selected commits while walking some or all of some part of the graph.
    • The --no-merges argument, found in both the accepted and the currently-top-ranked answers, suppresses showing some commits that are walked.
    • The --first-parent argument, from the currently-top-ranked-answer, suppresses walking some parts of the graph, during the graph-walk itself.
    • The --not prefix to command line arguments, as used in the accepted answer, suppresses ever visiting some parts of the graph at all, right from the start.

    We get the answers we like, to two different questions, using these features.

    0 讨论(0)
  • 2020-12-07 09:04

    @Prakash answer works. Just for clarity ...

    git checkout feature-branch
    git log master..HEAD
    

    lists the commits on feature-branch but not the upstream branch (typically your master).

    0 讨论(0)
提交回复
热议问题