what gets “cloned” and “pushed” during git clone and git push

爷,独闯天下 提交于 2021-02-08 05:58:15

问题


When I run a command such as

git push

or

git push origin master

and my repo looks like

      B--C--D <- master
     /
    A--E--F <- foo-branch

and origin just looks like

A <- master

does push include commits E and F? I understand that typcially it does not include foo-branch, but do all commits still get pushed?

Likewise, when i do

git clone <some-remote-repo>

I know I typically get one branch (seems to be usually master), but do I also have local copies of commits for for other branches, even if I don't get the pointers to their heads?


回答1:


It's partly transport-dependent: git has "dumb transports" (such as using http to transfer one object at a time) and "smart transports" (using the git:// or ssh:// protocols, where two gits negotiate with each other, then—provided that the receiver indicates that it's OK—the sender builds a "thin pack").

It's also partly command-dependent: for instance, if you ask for a "shallow" clone, or a single branch, you generally get less than if you do a "normal" clone. And, when you run git push, you can choose which particular commit IDs, if any, you deliver originally to the remote repository, and what branch-name(s) you'd like them to use.

Let's ignore the shallow and single-branch clones for now, though.

Given your example of:

  B--C--D  <- master
 /
A--E--F    <- foo-branch

and git push origin master (whose refspec is presumably equivalent to master:master, i.e., you have not configured an unusual push), where your remote origin currently has commit A (it doesn't matter what branch label(s) it has for A, only that it has A) and assuming a smart protocol, the handshake and transfer protocol starts out pretty much like this:

(your git) "what options do your support? I have thin-packs etc"
(their git) "I have thin-packs and ofs-delta and so on"
(your git) "ok, send me all your refs and their SHA-1s"
(their git) "refs/heads/master is <SHA-1 of A>"
(their git) "that's all I have"

At this point, your git knows what commits are required to get all the commits to the remote: these are the commits that would be listed if you ran, in your repository, git rev-list master ^A (fill in the actual SHA-1 of A, of course). There is no need to exclude additional SHA-1s as the remote origin has nothing but the one branch, whose tip is commit A.

The way this works internally is that git push runs git pack-objects (with --thin), which then runs git rev-list, passing it the commit IDs you've asked to push, with exclusions (--not or prefix ^) for all the commit IDs their git sent you (again in our case that's just the one commit-ID A). See the documentation for git rev-list, paying particular attention to the --objects-edge option (or --objects-edge-aggressive when working with shallow clones).

Your git rev-list therefore outputs the ID of commit D, plus the IDs of its tree and all of that tree's subtrees and blobs, unless it concludes (via the negated IDs, in this case the ^A that excludes commit A) that the remote git must already have them. It then outputs the ID of commit C and its tree, with the same "unless" condition, and so on. Note that commit A has a source tree associated with it; and suppose commit C has the same tree—for instance, suppose commit C is a revert of B. In this case there's no need to send C's tree: the remote must have it because the remote has commit A.

(This object-finding can be optimized via bitmaps. There's a github blog post, I think, describing the development of these bitmaps, which were a solution to the rather slow process of traversing lots of commit graphs so as to find which objects must already be in some remote repository based on some branch tip IDs. This helps them enormously because the fetch process across a smart protocol is symmetric with that of push: we simply swap send and receive roles.)

In any case, the output from your git rev-list feeds your git pack-objects --thin. This provides all the object IDs to take (commit D, its tree if needed, and any needed subtrees and blobs; commit C and needed objects; commit B and needed objects), and also IDs specifically not to take: commit A and its objects, and if there were commits before A, those and their objects. The pack-objects step makes a delta-compressed pack in which the "take these objects" objects are compressed against the "don't take these other objects" objects.

As a super-simplified example, suppose that the tree for A includes a 10 MB file whose last line is "The end". Suppose that the tree for B has a file that's almost the same, except the words "The end" are removed. Git can compress this file into the instructions "start with blob <id-of-file>, then remove the last line." These instructions are much less than 10 MB long and are allowed in the "thin pack".

It's this "thin pack" that is sent over the Internet-phone connection (or whatever datawire connects the two git instances). The receiver then "thickens" the pack into normal git packs (normal packs do not allow delta-compression against an object that is not already in the pack).


OK, that's quite long, but it boils down to: your git won't send F (because you didn't ask it to), nor E (because you're not sending F), nor will it look at the two trees attached to those two commits. But this does depend on the exact command you use, and whether you're using a smart protocol.

If you run git clone without --single-branch, your clone operation starts by calling up the remote as usual, and getting a list of all that remote's references (just like push!). To see these, use git ls-remote:

From git://git.kernel.org/pub/scm/git/git.git
aa826b651ae3012d1039453b36ed6f1eab939ef9    HEAD
fdca2bed90a7991f2a3afc6a463e45acb03487ac    refs/heads/maint
aa826b651ae3012d1039453b36ed6f1eab939ef9    refs/heads/master
595b96af80404335de2a8c292cee81ed3da24d29    refs/heads/next
60feb01a0d7c7d54849c233d2824880c57ff9e94    refs/heads/pu
7af04ad560ab8edb07b498d442780a6a794162b0    refs/heads/todo
d5aef6e4d58cfe1549adef5b436f3ace984e8c86    refs/tags/gitgui-0.10.0
3d654be48f65545c4d3e35f5d3bbed5489820930    refs/tags/gitgui-0.10.0^{}

[hundreds more snipped]

Your git then requests just about everything from the remote. (In this case the "just about" is unnecessary, but if they present you with refs/ other than heads/ and tags/ you might not get those. You also get some control over what tags your git brings over. The details here are a bit messy, but in most normal repositories, a clone will bring over all the tags.)

You're tripping over a faulty assumption when you say this:

I know I typically get one branch (seems to be usually master), but do I also have local copies of commits for for other branches, even if I don't get the pointers to their heads?

Your git asks for, and gets, all their branches. But your git renames them too. They're all renamed to live within the refs/remotes/ name-space, under the name of the remote (normally origin, but -o <name> or --origin <name> changes this). Their refs/heads/master becomes your refs/remotes/origin/master; their refs/heads/maint becomes your refs/remotes/origin/maint; and so on.

You will see all of these (abbreviated somewhat) by running git branch -r, which tells git branch to show remote-tracking branches. (And again, "remote-tracking branches" are just those branches whose full name starts with refs/remotes/. A git fetch from a particular remote updates the corresponding remote-tracking branches via the fetch = directives in the repo's configuration entry for that remote.)

The master that you see if you run git branch or git status is actually created as a last step in your clone. It doesn't actually run git checkout—it has the same code built in directly—but in essence, your clone, as its final operation, runs git checkout branch-or-sha1 for some branch name (or, as a last ditch attempt, a raw SHA-1 giving a "detached HEAD"). The name used is:

  • the one you supplied as an argument to git clone, or
  • the branch that the remote git's HEAD points to, if your branch can figure this out, or if it was provided during protocol negotiation.1

If those fail—and assuming you didn't instruct the clone process not to do a checkout—git clone does a checkout of the raw SHA-1 it got from the remote as the remote's HEAD. (In the example ls-remote output above this is aa826b651ae3012d1039453b36ed6f1eab939ef9.)


1Note that HEAD comes across as a raw SHA-1. For a long time, there was a bug in git where, if this SHA-1 corresponded to at least two branch names, git clone didn't know which branch to check out. Because smart protocols start by negotiating options, though, the git folks were able to add an option by which one git tells another "HEAD points to branch X". So now, even if the imported HEAD matches multiple imported refs/heads/* names, git can tell which one to use.




回答2:


The way this works internally is that git push runs git pack-objects (with --thin), which then runs git rev-list, passing it the commit IDs you've asked to push

This object-finding can be optimized via bitmaps.

Well, not since With Git 2.4.7 (Q3 2015)

See commit c8a70d3 (01 Jul 2015) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit ace6325, 10 Jul 2015)

rev-list: disable --use-bitmap-index when pruning commits

Signed-off-by: Jeff King

The reachability bitmaps do not have enough information to tell us which commits might have changed path "foo", so the current code produces wrong answers for:

git rev-list --use-bitmap-index --count HEAD -- foo

(it silently ignores the "foo" limiter). Instead, we should fall back to doing a normal traversal (it is OK to fall back rather than complain, because --use-bitmap-index is a pure optimization, and might not kick in for other reasons, such as there being no bitmaps in the repository).

This has been noted in Git 2.26 (Q1 2020): The object reachability bitmap machinery and the partial cloning machinery were not prepared to work well together, because some object-filtering criteria that partial clones use inherently rely on object traversal, but the bitmap machinery is an optimization to bypass that object traversal.

There however are some cases where they can work together, and they were taught about them.

See commit 20a5fd8 (18 Feb 2020) by Junio C Hamano (gitster).
See commit 3ab3185, commit 84243da, commit 4f3bd56, commit cc4aa28, commit 2aaeb9a, commit 6663ae0, commit 4eb707e, commit ea047a8, commit 608d9c9, commit 55cb10f, commit 792f811, commit d90fe06 (14 Feb 2020), and commit e03f928, commit acac50d, commit 551cf8b (13 Feb 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 0df82d9, 02 Mar 2020)

pack-bitmap: refuse to do a bitmap traversal with pathspecs

Signed-off-by: Jeff King

rev-list has refused to use bitmaps with pathspec limiting since c8a70d3509 ("rev-list: disable --use-bitmap-index when pruning commits", 2015-07-01, Git v2.5.0-rc2 -- merge).
But this is true not just for rev-list, but for anyone who calls prepare_bitmap_walk(); the code isn't equipped to handle this case.

We never noticed because the only other callers would never pass a pathspec limiter.

But let's push the check down into prepare_bitmap_walk() anyway. That's a more logical place for it to live, as callers shouldn't need to know the details (and must be prepared to fall back to a regular traversal anyway, since there might not be bitmaps in the repository).

It would also prepare us for a day where this case _is_ handled, but that's pretty unlikely. E.g., we could use bitmaps to generate the set of commits, and then diff each commit to see if it matches the pathspec.
That would be slightly faster than a naive traversal that actually walks the commits.
But you'd probably do better still to make use of the newer commit-graph feature to make walking the commits very cheap.


With Git 2.27 (Q2 2020), the object walk with object filter "--filter=tree:0" can now take advantage of the pack bitmap when available.

See commit 9639474, commit 5bf7f1e (04 May 2020) by Jeff King (peff).
See commit b0a8d48, commit 856e12c (04 May 2020) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 69ae8ff, 13 May 2020)

pack-bitmap.c: make object filtering functions generic

Signed-off-by: Taylor Blau

In 4f3bd5606a ("pack-bitmap: implement BLOB_NONE filtering", 2020-02-14, Git v2.26.0-rc0 -- merge listed in batch #8), filtering support for bitmaps was added for the 'LOFC_BLOB_NONE' filter.

In the future, we would like to add support for filters that behave as if they exclude a certain type of object, for e.g., the tree depth filter with depth 0.

To prepare for this, make some of the functions used for filtering more generic, such as 'find_tip_blobs' and 'filter_bitmap_blob_none' so that they can work over arbitrary object types.

To that end, create 'find_tip_objects' and 'filter_bitmap_exclude_type', and redefine the aforementioned functions in terms of those.



来源:https://stackoverflow.com/questions/33514547/what-gets-cloned-and-pushed-during-git-clone-and-git-push

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!